Delphi XE byte array index - arrays

I use simple circular buffer like this
var
Values: array [byte] of single;
ptr: byte;
In this test example
for ptr:=0 to 10 do Values[Byte(ptr-5)]:=1;
I expect to have set to 1 first 5 values and last 5 values, but XE4 compiller produce incorrect code, its using 32bit pointer math to calculate array index:
for ptr:=0 to 10 do Values[Byte(ptr-5)]:=1;
005B94BB C645FB00 mov byte ptr [ebp-$05],$00
005B94BF 33C0 xor eax,eax
005B94C1 8A45FB mov al,[ebp-$05]
005B94C4 C78485E0FBFFFF0000803F mov [ebp+eax*4-$0420],$3f800000
005B94CF FE45FB inc byte ptr [ebp-$05]
005B94D2 807DFB0B cmp byte ptr [ebp-$05],$0b
005B94D6 75E7 jnz $005b94bf
Is it my wrong code and whats proper way to operate byte indexes?

The question is:
Is a wrap expected within the Byte() cast?
Lets compare the disassembly with overflow checking on/off.
{$Q+}
Project71.dpr.21: for ptr:= 0 to 10 do Values[Byte(ptr-5)]:= 1;
0041D568 33DB xor ebx,ebx
0041D56A 0FB6C3 movzx eax,bl
0041D56D 83E805 sub eax,$05
0041D570 7105 jno $0041d577
0041D572 E82D8DFEFF call #IntOver
0041D577 0FB6C0 movzx eax,al
0041D57A C704870000803F mov [edi+eax*4],$3f800000
0041D581 43 inc ebx
0041D582 80FB0B cmp bl,$0b
0041D585 75E3 jnz $0041d56a
{$Q-}
Project71.dpr.21: for ptr:= 0 to 10 do Values[Byte(ptr-5)]:= 1;
0041D566 B30B mov bl,$0b
0041D568 B808584200 mov eax,$00425808
0041D56D C7000000803F mov [eax],$3f800000
0041D573 83C004 add eax,$04
0041D576 FECB dec bl
0041D578 75F3 jnz $0041d56d
With {$Q+} the wraps works, while with {$Q-} the wrap does not work and the compiler does not generate a range error for the wrong array indexing when {$R+} is set.
So, to me the conclusion is: Since the range check on does not generate a run-time error for an array index out of bounds, a wrap is expected.
This is further proved by the fact that a wrap is done when overflow checking is on.
This should be reported as a bug in the compiler.
Done: https://quality.embarcadero.com/browse/RSP-15527 "Type cast fail within array indexing"
Note: a workaround is given by #Rudy in his answer.
Addendum:
Following code:
for ptr:= 0 to 10 do WriteLn(Byte(ptr-5));
generates:
251
252
253
254
255
0
1
2
3
4
5
for all combinations of range/overflow checking.
Likewise Values[Byte(-1)] := 1; assigns 1 to Values[255] for all compiler options.
The documentation for Value Typecasts says:
The resulting value is obtained by converting the expression in parentheses. This may involve truncation or extension if the size of the specified type differs from that of the expression. The expression's sign is always preserved.

My code is written in Delphi 10.1 Berlin, but the result seems to be the same.
Let's extend your little code piece a little:
procedure Test;
var
Values: array[Byte] of Single;
Ptr: byte;
begin
Values[0] := 1.0;
for Ptr := 0 to 10 do
Values[Byte(Ptr - 5)] := 1.0;
end;
This gives the following code in the CPU view:
Project80.dpr.15: Values[0] := 1.0;
0041A1DD C785FCFBFFFF0000803F mov [ebp-$00000404],$3f800000
Project80.dpr.16: for Ptr := 0 to 10 do
0041A1E7 C645FF00 mov byte ptr [ebp-$01],$00
Project80.dpr.17: Values[Byte(Ptr-5)] := 1.0;
0041A1EB 33C0 xor eax,eax
0041A1ED 8A45FF mov al,[ebp-$01]
0041A1F0 C78485E8FBFFFF0000803F mov [ebp+eax*4-$0418],$3f800000
0041A1FB FE45FF inc byte ptr [ebp-$01]
Project80.dpr.16: for Ptr := 0 to 10 do
0041A1FE 807DFF0B cmp byte ptr [ebp-$01],$0b
0041A202 75E7 jnz $0041a1eb
As we can see, the first element of the array is at [ebp-$00000404], so [ebp+eax*4-$0418] is indeed below the array (for values 0..4).
That looks like a bug to me, because for Ptr = 0, Byte(Ptr - 5) should wrap around to $FB. The generated code should be something like:
mov byte ptr [ebp-$01],$00
xor eax,eax
#loop:
mov al,[ebp-$01]
sub al,5 // Byte(Ptr - 5)
mov [ebp+4*eax-$0404],$3f800000 // al = $FB, $FC, $FD, $FE, $FF, 00, etc..
inc byte ptr [ebp-$01]
cmp byte ptr [ebp-$01],$0b
jnz #loop
Good find!
There is a workaround, though:
Values[Byte(Ptr - 5) + 0] := 1.0;
This produces:
Project80.dpr.19: Values[Byte(Ptr - 5) + 0] := 1.0;
0040F16B 8A45FF mov al,[ebp-$01]
0040F16E 2C05 sub al,$05
0040F170 25FF000000 and eax,$000000ff
0040F175 C78485FCFBFFFF0000803F mov [ebp+eax*4-$0404],$3f800000
And that works nicely, although the and eax,$000000ff seems unnecessary to me.
FWIW, I also looked at the code generated with optimization on. Both in XE and Berlin, the error exists as well, and the workaround works too.

Sounds like an unexpected behavior of the compiler. But I never assume that transtyping integers using byte() would always make a rounding around $ff. It does, most of the time, e.g. if you assign values between variables, but there are cases where it doesn't - as you discovered. So I would have never used this byte() expression within an array index computation.
I always observed that using byte variables is not worth it, and you should rather use plain integer (or NativeInt), so that it matches the CPU registers, and then don't assume any complex rounding.
In all cases, I would rather make the 255 rounding explicit, as such:
procedure test;
var
Values: array [byte] of single;
ptr: integer;
begin
for ptr:=0 to 10 do Values[(ptr-5) and high(Values)]:=1;
end;
As you can see, I've made some modifications:
Define the for loop index as an integer, to use a CPU register;
Use the and operation for fast binary rounding (writing (ptr-5) mod 256 would be much slower);
Use high(Values) instead of fixed $ff constant, which indicates where this rounding comes from.
Then the generated code is quick and optimized:
TestAll.dpr.114: begin
0064810C 81C400FCFFFF add esp,$fffffc00
TestAll.dpr.115: for ptr:=0 to 10 do Values[(ptr-5) and high(Values)]:=1;
00648112 33C0 xor eax,eax
00648114 8BD0 mov edx,eax
00648116 83EA05 sub edx,$05
00648119 81E2FF000000 and edx,$000000ff
0064811F C704940000803F mov [esp+edx*4],$3f800000
00648126 40 inc eax
00648127 83F80B cmp eax,$0b
0064812A 75E8 jnz -$18
TestAll.dpr.116: end;
0064812C 81C400040000 add esp,$00000400
00648132 C3 ret

Related

Floating point numbers and the effect on 8-bit microcontrollers memory

I am currently working on a project that includes bare-metal programming on an stm-8 micro-controller using the SDCC compiler in linux. The memory in the chip is quite low so I'm trying to keep things really lean. I have gotten by with using 8-bit and 16-bit variables and things have gone well. But recently I ran into a problem were I really needed a float variable. So i wrote a function that takes in a 16-bit value converts to a float does the math I need and returns an 8-bit number. This cause my final compiled code on the MCU to go from 1198 Bytes to 3462 Bytes. Now I understand that using floating points is memory intensive and that many functions may need to be called to handle the use of the floating point number but it seems crazy to increase the size of the program by that much. I would like some help understanding why this is and what happened exactly.
Specs: MCU stm8151f2
Compiler: SDCC with --opt_code_size option
int roundNo(uint16_t bit_input)
{
float num = (((float)bit_input) - ADC_MIN)/124.0;
return num < 0 ? num - 0.5 : num + 0.5;
}
To determine why the code is so large on your particular tool chain, you would need to look at the generated assembly code, and see what FP support calls it makes, then look at the map file to determine the size of each of those functions.
As an example on Godbolt for AVR using GCC 5.4.0 with -Os (Godbolt does not support STM8 or SDCC so this is for comparison as a 8-bit architecture) your code generates 6364 bytes compared 4081 bytes for an empty function. So the additional code required for the code body is 2283 bytes. Now accounting for the fact that you are using both a different compiler and architecture, these are not that different from your results. See in the generated code (below) the rcalls to subroutines such as __divsf3 - these are where the bulk of the code will be, and I suspect FP division is by far the larger contributor.
roundNo(unsigned int):
push r12
push r13
push r14
push r15
mov r22,r24
mov r23,r25
ldi r24,0
ldi r25,0
rcall __floatunsisf
ldi r18,0
ldi r19,0
ldi r20,0
ldi r21,lo8(69)
rcall __subsf3
ldi r18,0
ldi r19,0
ldi r20,lo8(-8)
ldi r21,lo8(66)
rcall __divsf3
mov r12,r22
mov r13,r23
mov r14,r24
mov r15,r25
ldi r18,0
ldi r19,0
ldi r20,0
ldi r21,0
rcall __ltsf2
ldi r18,0
ldi r19,0
ldi r20,0
ldi r21,lo8(63)
sbrs r24,7
rjmp .L6
mov r25,r15
mov r24,r14
mov r23,r13
mov r22,r12
rcall __subsf3
rjmp .L7
.L6:
mov r25,r15
mov r24,r14
mov r23,r13
mov r22,r12
rcall __addsf3
.L7:
rcall __fixsfsi
mov r24,r22
mov r25,r23
pop r15
pop r14
pop r13
pop r12
ret
You need to perform the same analysis on the code generated by your tool chain to answer your question. No doubt SDCC is capable of generating an assembly listing and a map file which will allow you to determine exactly what code and FP support is being generated and linked.
Ultimately though your use of FP in this case is entirely unnecessary:
int roundNo(uint16_t bit_input)
{
int s = (bit_input - ADC_MIN) ;
s += s < 0 ? -62 : 62 ;
return s / 124 ;
}
At Godbolt 2283 bytes compared to an empty function. Still somewhat large, but the issue there most likely is that the AVR lacks a DIV instruction so calls __divmodhi4. STM8 has a DIV for 16 bit dividend and 8 bit divisor, so it will likely be significantly smaller (and faster) on your target.
OK, a version of fixed point that actually works:
// Assume a 28.4 format for math. 12.4 can be used, but roundoff may occur.
// Input should be a literal float (Note that the multiply here will be handled by the
// compiler and not generate FP asm code.
#define TO_FIXED(x) (int)((x * 16))
// Takes a fixed and converts to an int - should turn into a right shift 4.
#define TO_INT(x) (int)((x / 16))
typedef int FIXED;
const uint16_t ADC_MIN = 32768;
int roundNo(uint16_t bit_input)
{
FIXED num = (TO_FIXED(bit_input - ADC_MIN)) / 124;
num += num < 0 ? TO_FIXED(-0.5) : TO_FIXED(0.5);
return TO_INT(num);
}
int main()
{
printf("%d", roundNo(0));
return 0;
}
Note we are using some 32-bit values here so it will be bigger than your current values. With care though, it could possibly convert back to a 12.4 (16-bit int) instead if round off and overflow can be managed carefully.
Or go grab a better full feature Fixed Point library from the web :)
(Update) After writing this, I noticed that #Clifford mentioned that your microcontroller supports this DIV instruction natively, in which case doing this is redundant. Anyway, I will leave it as a concept which can be applied in cases where DIV is implemented as an extern call, or for cases where DIV takes too many cycles and the goal is to make the calculation faster.
Anyway, shifting and adding is likely to be faster than division, if you ever need to squeeze some extra cycles. So if you start from the fact that 124 is almost equal to 4096/33 (the error factor is 0.00098, i.e. 0.098%, so less than 1 in 1000), you can implement the division with a single multiplication with 33 and a shift by 12 bits (division by 4096). Furthermore, 33 is 32+1, meaning multiplying by 33 is equal to shifting left by 5 and adding the input again.
Example: you want to divide 5000 by 124, and 5000/124 is approx. 40.323. What we will be doing is:
5,000 << 5 = 160,000
160,000 + 5,000 = 165,000
165,000 >> 12 = 40
Note that this only works for positive numbers. Also note that, if you're really doing lots of multiplications all over the code, then having a single extern mul or div function might result in smaller overall code in the long run, especially if the compiler is not particularly good at optimizing. And if the compiler can just emit a DIV instruction here, then the only thing you can get is a tiny bit of speed improvement, so don't bother with this.
#include <stdint.h>
#define ADC_MIN 2048
uint16_t roundNo(uint16_t bit_input)
{
// input too low, return zero
if (bit_input < ADC_MIN)
return 0;
bit_input -= (ADC_MIN - 62);
uint32_t x = bit_input;
// this gets us x = x * 33
x <<= 5;
x += bit_input;
// this gets us x = x / 4096
x >>= 12;
return (uint16_t)x;
}
GCC AVR with size optimizations produces this, i.e. all calls to extern mul or div functions are gone, but it seems like AVR doesn't support shifting multiple bits in a single instruction (it emits loops which shift 5 times and 12 times respectively). I don't have a clue what your compiler will do.
If you also need to handle the bit_input < ADC_MIN case, I would handle this part separately, i.e.:
#include <stdint.h>
#include <stdbool.h>
#define ADC_MIN 2048
int16_t roundNo(uint16_t bit_input)
{
// if subtraction would result in a negative value,
// handle it properly
bool negative = (bit_input < ADC_MIN);
bit_input = negative ? (ADC_MIN - bit_input) : (bit_input - ADC_MIN);
// we are always positive from this point on
bit_input -= (ADC_MIN - 62);
uint32_t x = bit_input;
x <<= 5;
x += bit_input;
x >>= 12;
return negative ? -(int16_t)x : (int16_t)x;
}

Need help converting pseudo code to an 80x86 Assembly language program with the MASM assembler

I'm using Visual Studio 2012, editing the .asm file inside a windows32 solution.
This is the pseudo code that needs to be changed into assembly:
Declare a 32-bit integer array A[10] in memory
repeat
Prompt for and input user's array length L
until 0 <= L <= 10
for i := 0 to (L-1)
Prompt for and input A[i]
end for
while (First character of prompted input string for searching = 'Y' or 'y')
Prompt for and input value V to be searched
found := FALSE
for i := 0 to (L-1)
if V = A[i] then
found := TRUE
break
end if
end for
if (found) then
Display message that value was found at position i
else
Display message that value was not found
end if
end while
I can manage the inputs, loops, and jumps well enough but the things tripping me up are making the array have a length that the user inputs, and running through the array to compare the values. What would help me out the most is if someone would make and explain a segments of code to help me understand the parts I'm not getting. I've tried searching for it online but nearly everything I've come across uses a different assembler making it hard to dissect.
My code thus far is:
.586
.MODEL FLAT
INCLUDE io.h ; header file for input/output
.STACK 4096
.DATA
lengthA DWORD ?
promptL BYTE "Please enter a length of the array between 0 and 10: ", 0
string BYTE 40 DUP (?)
A DWORD 0 DUP (?)
.CODE
_MainProc PROC
Reread: input promptL, string, 40 ; read ASCII characters
atod string ; convert to integer
;while (promptL < 0 or > 10)
cmp eax, 0
jl Reread
cmp eax, 10
jg Reread
;end while
mov lengthA, eax ; store in memory
_MainProc ENDP
END ; end of source code
After checking that the user input is within range I just hit a brick wall and am not sure how to set up the array A to have the specified length or even if I declared A correctly.
You can't modify the assembly-time constant at runtime. The DUP directive is that, assembly-time directive reserving memory space by emitting the "init" value as many times, as you want, into resulting machine code. That forms a fixed executable, which is limited to that size used during assembling.
As your maximum L is 10, you can reserve array for the maximum possible size:
A DWORD 10 DUP (?)
And then later in the code you will have to fetch the [lengthA] to know how many elements are used (to make it look as dynamically resized array and to not process remaining unused part of that reserve after address A).
Other option is to reserve memory dynamically at runtime, either by calling OS service for allocation of heap memory, or reserving the array on stack. But both options are considerably more advanced than using the fixed-size memory like above.
I don't see in your pseudo code any request for dynamic memory allocation, the solution with fixed size array looks OK to me, especially if you are struggling with it already, then you are probably not ready to program your own dynamic memory manager/allocator.
EDIT: Actually the pseudo code does specify you should reserve fixed size memory, I somehow overlooked it at first read:
Declare a 32-bit integer array A[10] in memory
So example of pseudo code using the [lengthA]:
; for i := 0 to (L-1)
xor edi,edi ; index i
input_loop:
cmp edi,[lengthA]
jge input_loop_ends ; if (i >= L), end for
; TODO: Prompt for and input A[i]
; eax = input value, preserve edi (!)
mov [A + 4*edi],eax ; store value into A[i]
; end for
inc edi ; ++i
jmp input_loop
input_loop_ends:
; memory at address A contains L many DWORD values from user

Delphi Optimize IndexOf Function

Could someone help me speed up my Delphi function
To find a value in a byte array without using binary search.
I call this function thousands of times, is it possible to optimize it with assembly?
Thank you so much.
function IndexOf(const List: TArray< Byte >; const Value: byte): integer;
var
  I: integer;
begin
  for I := Low( List ) to High( List ) do begin
   if List[ I ] = Value then
    Exit ( I );
end;
  Result := -1;
end;
The length of the array is about 15 items.
Well, let's think. At first, please edit this line:
For I := Low( List ) to High( List ) do
(you forgot 'do' at the end). When we compile it without optimization, here is the assembly code for this loop:
Unit1.pas.29: If List [I] = Value then
005C5E7A 8B45FC mov eax,[ebp-$04]
005C5E7D 8B55F0 mov edx,[ebp-$10]
005C5E80 8A0410 mov al,[eax+edx]
005C5E83 3A45FB cmp al,[ebp-$05]
005C5E86 7508 jnz $005c5e90
Unit1.pas.30: Exit (I);
005C5E88 8B45F0 mov eax,[ebp-$10]
005C5E8B 8945F4 mov [ebp-$0c],eax
005C5E8E EB0F jmp $005c5e9f
005C5E90 FF45F0 inc dword ptr [ebp-$10]
Unit1.pas.28: For I := Low (List) to High (List) do
005C5E93 FF4DEC dec dword ptr [ebp-$14]
005C5E96 75E2 jnz $005c5e7a
This code is far from being optimal: local variable i is really local variable, that is: it is stored in RAM, in stack (you can see it by [ebp-$10] adresses, ebp is stack pointer).
So at each new iteration we see how we load address of array into eax register (mov eax, [ebp-$04]),
then we load i from stack into edx register (mov edx, [ebp-$10]),
then we at least load List[i] into al register which is lower byte of eax (mov al, [eax+edx])
after which compare it with argument 'Value' taken again from memory, not from register!
This implementation is extremely slow.
But let's turn optimization on at last! It's done in Project options -> compiling -> code generation. Let's look at new code:
Unit1.pas.29: If List [I] = Value then
005C5E5A 3A1408 cmp dl,[eax+ecx]
005C5E5D 7504 jnz $005c5e63
Unit1.pas.30: Exit (I);
005C5E5F 8BC1 mov eax,ecx
005C5E61 5E pop esi
005C5E62 C3 ret
005C5E63 41 inc ecx
Unit1.pas.28: For I := Low (List) to High (List) do
005C5E64 4E dec esi
005C5E65 75F3 jnz $005c5e5a
now there are just 4 lines of code which gets repeated over and over.
Value is stored inside dl register (lower byte of edx register),
address of 0-th element of array is stored in eax register,
i is stored in ecx register.
So the line 'if List[i] = Value' converts into just 1 assembly line:
005C5E5A 3A1408 cmp dl,[eax+ecx]
the next line is conditional jump, 3 lines after that are executed just once or never (it's if condition is true), and at last there is increment of i,
decrement of loop variable (it's easier to compare it with zero then with anything else)
So, there is little we can do which Delphi compiler with optimizer didn't!
If it's permitted by your program, you can try to reverse direction of search, from last element to first:
For I := High( List ) downto Low( List ) do
this way compiler will be happy to compare i with zero to indicate that we checked everything (this operation is free: when we decrement i and got zero, CPU zero flag turns on!)
But in such implementation behaviour may be different: if you have several entries = Value, you'll get not the first one, but the last one!
Another very easy thing is to declare this IndexOf function as inline: this way you'll probably have no function call here: this code will be inserted at each place where you call it. Function calls are rather slow things.
There are also some crazy methods described in Knuth how to search in simple array as fast as possible, he introduces 'dummy' last element of array which equals your 'Value', that way you don't have to check boundaries (it will alway find something before going out of range), so there is just 1 condition inside loop instead of 2. Another method is 'unrolling' of loop: you write down 2 or 3 or more iterations inside a loop, so there are less jumps per each check, but this has even more downsides: it will be beneficial only for rather large arrays while may make it even slower for arrays with 1 or 2 elements.
As others said: the biggest improvement would be to understand what kind of data you store: does it change frequently or stays the same for long time, do you look for random elements or there are some 'leaders' which gets the most attention. Must these elements be in the same order as you put them or it's allowed to rearrange them as you wish? Then you can choose data structure accordingly. If you look for some 1 or 2 same entries all the time and they can be rearranged, a simple 'Move-to-front' method would be great: you don't just return index but first move element to first place, so it will be found very quickly the next time.
If your arrays are long, you can use the x86 built in string scan REP SCAS.
It is coded in microcode and has a moderate start-up time, but it is
heavily optimized in the CPU and runs fast given long enough data structures (>= 100 bytes).
In fact on a modern CPU it frequently outperforms very clever RISC code.
If your arrays are short, then no amount of optimization of this routine will help, because then your problem is in code not shown in the question, so there is no answer I can give you.
See: http://docwiki.embarcadero.com/RADStudio/Tokyo/en/Internal_Data_Formats_(Delphi)
function IndexOf({$ifndef RunInSeperateThread} const {$endif} List: TArray<byte>; const Value: byte): integer;
//Lock the array if you run this in a separate thread.
{$ifdef CPUX64}
asm
//RCX = List
//DL = byte.
mov r8,[rcx-8] //3 - get the length ASAP.
push rdi //0 - hidden in mov r,m
mov eax,edx //0 - rename
mov rdi,rcx //0 - rename
mov rcx,r8 //0 - rename
mov rdx,r8 //0 - remember the length
//8 cycles setup
repne scasb //2n - repeat until byte found.
pop rdi //1
neg rcx //0
lea rax,[rdx+rcx] //1 result = length - bytes left.
end;
{$ENDIF}
{$ifdef CPUX86}
asm
//EAX = List
//DL = byte.
push edi
mov edi,eax
mov ecx,[eax-4] //get the length
mov eax,edx
mov edx,ecx //remember the length
repne scasb //repeat until byte found.
pop edi
neg ecx
lea eax,[edx+ecx] //result = length - bytes left.
end;
Timings
On my laptop using an array of 1KB with the target byte at the end this gives the following timings (lowest time using a 100.0000 runs)
Code | CPU cycles
| Len=1024 | Len=16
-------------------------------+----------+---------
Your code optimizations off | 5775 | 146
Your code optimizations on | 4540 | 93
X86 my code | 2726 | 60
X64 my code | 2733 | 69
The speed-up is OK (ish), but hardly worth the effort.
If your array's are short, then this code will not help you and you'll have to resort to better other options to optimize your code.
Speed up possible when using binary search
Binary search is a O(log n) operation, vs O(n) for naive search.
Using the same array this will find your data in log2(1024) * CPU cycles per search = 10 * 20 +/- 200 cycles. A 10+ times speed up over my optimized code.

Assembly: Error when attempting to increment at array index

Here's a small snippet of assembly code (TASM) where I simply try to increment the value at the current index of the array. The idea is that the "freq" array will store a number (DWord size) that represents how many times that ASCII character was seen in the file. To keep the code short, "b" stores the current byte being read.
Declared in data segment
freq DD 256 DUP (0)
b DB ?
___________
Assume b contains current byte
mov bl, b
sub bh, bh
add bx, bx
inc freq[bx]
I receive this error at compilation time at the line containing "inc freq[bx]": ERROR Argument to operation or instruction has illegal size.
Any insight is greatly appreciated.
There is no inc that can increment a dword in 16 bit mode. You will have to synthesize it from add/adc, such as:
add freq[bx], 1
adc freq[bx + 2], 0
You might need to add a size override, such as word ptr or change your array definition to freq DW 512 DUP (0).
Also note that you have to scale the index by 4, not 2.

variable changing value unexpectedly only when equal to zero in MASM32

I recently got arrays working in masm32, but I'm running into a very confusing snag. I have a procedure (AddValue) that accepts one argument and adds that argument to an element in an array called bfmem. Which element to affect is determined by a variable called index. However, index appears to be changing its value where i would not expect it to.
If index is greater than 0, the program behaves normally. However, if index equals 0, its value will be changed to whatever value was passed to the procedure. This is utterly baffling to me, especially as this only happens when index is set to zero. I don't know much MASM so forgive me if this is a really simple problem.
.386
.model flat,stdcall
option casemap:none
include \masm32\include\windows.inc
include \masm32\include\masm32.inc
include \masm32\include\kernel32.inc
includelib \masm32\lib\masm32.lib
includelib \masm32\lib\kernel32.lib
.data
bfmem dd 0 dup(30000)
index dd 0
_output db 10 dup(0)
.code
AddValue proc val ; adds val to bfmem[index]
invoke dwtoa, index, addr _output
invoke StdOut, addr _output ;prints '0' (as expected)
mov ecx, index ;mov index to register for arithmatic
mov eax, val ;mov val to register for arithmatic
add [ecx * 4 + bfmem], eax ;multiply index by 4 for dword
;then add to array to get the element's
;memory address, and add value
invoke dwtoa, index, addr _output
invoke StdOut, addr _output ;prints '72' (not as expected)
ret
AddValue endp
main:
mov index, 0
invoke AddValue, 72
invoke StdIn, addr _output, 1
invoke ExitProcess, 0
end main
The only thing I can think of is that the assembler is doing some kind of arithmetic optimization (noticing ecx is zero and simplifying the [ecx * 4 + bfmem] expression in some way that changes the output). If so, how can I fix this?
Any help would be appreciated.
The problem is that your declaration:
bfmem dd 0 dup(30000)
Says to allocate 0 bytes initialized with the value 30000. So when index is 0, you are overwriting the value of index (the address of index and bfmem coincide). Larger indexes you don't see the problem because you're overwriting other memory, like your output buffer. If you want to test to see that this is what's happening, try this:
bfmem dd 0 dup(30000)
index dd 0
_messg dd "Here is an output message", 13, 10, 0
Run your program with an index value of 1, 2, 3 and then display the message (_messg) using invoke StdOut.... You'll see that it overwrites parts of the message.
I assume you meant:
bfmem dd 30000 dup(0)
Which is 30000 bytes initialized to 0.

Resources