Branching when negative accumulator - loops

I'm trying to create a loop that will print if a number given is odd or even (Par). How do I branch the loop when the accumulator value is -1?
START INP // int(input(""))
STA n // n =
LOOP LDA n //
BRZ END // while n !=0:
SUB En // n - 1
STA n // n =
INP // int(input(""))
ADD sum //
STA sum //
BRA LOOP //
END LDA sum
OUT
BRP PO
PO LDA sum
BRZ EXIT
LDA sum
SUB TO
STA sum
BRA PO
ODDE LDA O
OTC
LDA D
OTC
LDA D
OTC
LDA E
OTC
O DAT 79
D DAT 68
E DAT 69
HLT
EXIT BRP PAR
HLT
PAR LDA P
OTC
LDA A
OTC
LDA R
OTC
P DAT 80
A DAT 65
R DAT 82
HLT
TO DAT 2
n DAT 0
sum DAT 0
En DAT 1
Par DAT -1

How do I branch the loop when the accumulator value is -1?
The LMC defines mailboxes with values between 0 and 999. They cannot be negative. Even though you can subtract a bigger value from a smaller value, the accumulator's value is then undefined. According to Wikipedia:
SUBTRACT [...] the actions of the accumulator are not defined for subtract instructions that cause negative results - however, a negative flag will be set so that 7xx (BRZ) and 8xx (BRP) can be used properly.
So the only way to reliably detect a negative value is by using BRP: that branch instruction will jump to the provided target address unless a recent subtraction had set the negative flag.
Code review
There are the following issues in your code:
Par DAT -1: as stated above, you cannot store -1 in an LMC mailbox. Mailboxes can only store values between 000 and 999.
Par versus PAR: you have two labels that only differ in capitalisation. LMC implementations are usually not case sensitive, so this would make those two labels the same. Better use entirely different labels.
BRP PO: The label PO points to the very next instruction, so this means that code execution will always continue at that instruction, whether you branch or not. It makes this instruction useless.
O DAT 79: this line appears right after a set of instructions that ends with OTC. If ever that code is executed, it will run into this DAT line. That could lead to undefined behaviour. You don't want this to happen. So make sure that DAT mailboxes are shielded from code execution. Add a HLT before a block of DAT to avoid that they are ever executed as if they were code. You have a similar issue at P DAT 80.
BRZ EXIT: at the EXIT address, you have a BRP, but as you can only arrive there when the accumulator is zero, the negative flag will not be set, and so BRP will branch always. Note that BRP branches when the negative flag is not set.
ODDE: this label is never referenced, and that code block is never executed. You could consider changing the BRA -- that appears just before it -- to a BRP. Then the execution will fall through when the last subtraction led to a negative result (virtually -1 in your case, but the accumulator is undefined).
If you correct all these issues, you'll get to an implementation that will be very close to this working version

Related

How can you reverse a list in LMC language?

I have a list in LMC and I would like to try to reverse it like so :
tab dat 111
dat 222
dat 333
dat 444
dat 555
tab dat 555
dat 444
dat 333
dat 222
dat 111
-I tried to find the right element first by using the table size
-Then I substracted 200 from that instruction so that the instructions it turns from 520 -> 320.
-Essentially I changed the instruction from LOAD what is currently in the accumulator to the 20th square in the RAM to STORE what is currently in the accumulator to the 20th sqaure in the RAM
-Then I loaded the content tab at index 0 into the accumulator (111) then saved it in the last index
-I dont know what I have to do afterwards
-I feel like my approche to the problem is somehow wrong
right_el lda size
sub one
sta size
lda load
add size
sub 2hund
sta save
load lda tab
bra save
inc lda load
add one
sto load
bra load
save dat
bra right_el
left_el dat
tab dat 111
dat 222
dat 333
dat 444
dat 555
one dat 1
size dat 5
temp dat
2hund dat 200
I tried to run the program step by step. I managed to turn the table into:
tab dat 111
dat 222
dat 333
dat 444
dat 111
but I dont know what to do afterwards
This is a good start. A few issues:
At save you will write the moved value, but thereby lose the value that was sitting there.
temp is not used, but you would need it for saving the original value before it is overwritten, so that you can then read it again from temp and write that back to the first half of the list.
When branching back to the top, you need size to be reduced by 2, not by 1, because the load address has increased by one, and the distance to the target reduced with 2, not 1. You could solve this by changing the start of your program to:
lda size
add one # to compensate for the minus 2
sta size
right_el lda size
sub two
sta size
... and define two as 2.
You need a stop condition. When the reduced size is zero or less, the program should stop.
The code at inc is never executed. It should be.
You need two more self-modifying instructions. You currently have them for:
Reading from the left side of the list
Writing to the right side of the list
But you also need two for:
Reading from the right side of the list
Writing to the left side of the list
Some other remarks:
Your code mixes two variants of mnemonics: sto and sta. I would stick with one flavour.
Instead of bra save, you could just move that targeted code block right there, so no branching is needed.
I would use twohund as label instead of 2hund. It is common practice to not start identifiers with digits, and some simulators might even have a problem with it.
I would use loop as label instead of right_el as surely the loop will have to cover the complete swap -- from left to right and vice versa.
The following three instructions:
sta size
lda load
add size
Can be written with just two:
sta size
add load
Here is the resulting code -- I suffixed a few of your labels with "left" and "right" so I could add my own and make the distinction:
start LDA size
ADD one # to compensate for the minus 2
STA size
loop LDA size
SUB two
BRP continue # check the loop-stop condition
quit HLT
continue BRZ quit
STA size
ADD loadleft # add size in one go
STA loadright # manage the other dynamic opcode
SUB twohund
STA saveright
SUB size
STA saveleft # and another dynamic code.
loadright DAT
STA temp # first save the value that is targeted
loadleft LDA tab
saveright DAT
LDA temp # copy in the other direction
saveleft DAT
inc LDA loadleft
ADD one
STA loadleft # use consistent mnemnonic
BRA loop
tab DAT 111
DAT 222
DAT 333
DAT 444
DAT 555
one DAT 1
two DAT 2
size DAT 5
temp DAT
twohund DAT 200
<script src="https://cdn.jsdelivr.net/gh/trincot/lmc#v0.816/lmc.js"></script>
You can run the code right here.

Programming arrays in LMC

I am working on this challenge:
The program needs to accept a sequence of integers. It ends with number 999. The integers (except 999) are placed in a list.
The integers must be less than or equal to 99. Any inputs greater than 99 are not placed in the list.
If the input would have more than ten numbers, only the first ten are stored.
999 is not part of the output.
I don't know how to limit the list length to ten (10) numbers. Also I don't know how to output the list in reverse order.
This is my code:
start INP
STA temp
SUB big
BRZ doout
LDA temp
SUB hundred
BRP start
sub one
STA N
xx STA ARR
LDA xx
add one
sta xx
BRA start
doout HLT
temp dat 0
big dat 999
hundred dat 100
ARR dat
one dat 1
N dat 10
The xx in your program show that you haven't taken the hint from How can I store an unknown number of inputs in different addresses in LMC (little-man-computer)?
It explains how you can have self-modifying code to walk through an array -- either to store values or to load them.
In your attempt there is no section that deals with outputting.
For the start section of the program I would actually suggest to subtract first the 100 and then 899 (which amounts to 999). That way you can keep the (reducing) input in the accumulator without having to restore it.
Also, due to an ambiguity in the specification of LMC, it is not entirely "safe" to do a BRZ right after a SUB (this is because the content of the accumulator is undefined/unspecified when there is underflow, so in theory it could be 0). You should always first do a BRP before doing a BRZ in the branched code. However, as input cannot be greater than 999, a BRP is enough to detect equality.
For the self modifying part, you can set an end-marker in your array data section, and define the LDA and STA instructions that would read/store a value at the end of the array. Whenever your code has that exact instruction, you know you have reached the end.
Here is how it can work:
LDA store # Initialise dynamic store instruction
STA dyna1
loop INP
dyna1 STA array
SUB toobig
BRP skip
LDA dyna1
ADD one
STA dyna1
SUB staend
BRP print
BRA loop
skip SUB trailer
BRP print # Safer to do BRP than BRZ
BRA loop # Input was less than 999
print LDA dyna1 # Convert dynamic store instruction
SUB store # ... to index
ADD load # ... to load instruction
STA dyna2
loop2 LDA dyna2
SUB one
STA dyna2
SUB load
BRP dyna2
end HLT # all done
dyna2 LDA array
OUT
BRA loop2
store STA array
load LDA array
staend STA after
one DAT 1
toobig DAT 100
trailer DAT 899
array DAT
DAT
DAT
DAT
DAT
DAT
DAT
DAT
DAT
DAT
after DAT
<script src="https://cdn.jsdelivr.net/gh/trincot/lmc#v0.816/lmc.js"></script>
As you can see (while running the script here), the instructions at dyna1 and dyna2 are modified during the execution of the loop they are in.

pick pair of bit (0b11) in memory array

my embedded system has 128KB memory array structure for specific purpose
and each 2bit represents 4state( state 0 ,state 1, state 2, state 3)
I'd like to count total state 3 (0b11) in memory array
for example 0xFF001234 = 1111 1111 0000 0000 0001 0010 0011 0100
It counts 5 (0b11)
I searched algorithm but it only counts single bit
- https://www.geeksforgeeks.org/count-set-bits-in-an-integer/
I hope to avoid greedy algorithm like compare 0b11 every 2bit
anyone has good idea?
ps : I'm using LEON3 Sparc V8 32bit processor, using C language
You have an array uint32_t states[] where each state[i] represents 16 states?
To count the number of 0b11 states in the variable uint32_t s you can use the following approach:
First, pre-process s such that every state 0b11 leads to exactly one 1 bit and all other states lead to 0 bits. Then count the numbers of 1 bits.
Pre-Processing
Split s into the left bits l and right bits r of each state.
s AB CD EF GH IJ LM NO PQ RS TU VW XY ZΓ ΔΠ ΦΨ ДЖ
l = s & 0xAAAAAAAA = A0 C0 E0 G0 I0 L0 N0 P0 R0 T0 V0 X0 Z0 Δ0 Φ0 Д0
r = s & 0x55555555 = 0B 0D 0F 0H 0J 0M 0O 0Q 0S 0U 0W 0Y 0Γ 0Π 0Ψ 0Ж
Then align the bits of l and r.
(l >>> 1) = 0A 0C 0E 0G 0I 0L 0N 0P 0R 0T 0V 0X 0Z 0Δ 0Φ 0Д
r = 0B 0D 0F 0H 0J 0M 0O 0Q 0S 0U 0W 0Y 0Γ 0Π 0Ψ 0Ж
Finally, use & to get a 1-bit if and only if the state was 0b11.
(l >>> 1) & r = 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0?
Here ? is 1 if the corresponding state was 0b11 and 0 otherwise.
The same result can be achived by the simplified formula (s >>> 1) & s & 0x55555555.
Bit-Counting
To count the 0b11 states in s we only have to count the 1-bits in
(s >>> 1) & s & 0x55555555.
Bit-counting can be done without a loop as explained in the book Hacker's Delight, chapter 5 or in this Stackoverflow answer.
The methods shown here only apply to a single array element. To count the states in your whole array loop over its elements.
Optimization
As pointed out by Lundin in the comments, the operation (s >>> 1) might me expensive if your processors cannot fit uint32_t into its registers. In this case it would be sensible to declare your array states[] not as uint32_t but whatever works best on your processor – the procedure stays the same, you only have to use more or less 555…. If, for some reason you cannot change the type of your array, you can still access it as if it had another type, see how to cast an int array to a byte array in C.

How do AVR Assembly BRNE delay loops work?

An online delay loop generator gives me this delay loop of runtime of 0.5s for a chip running at 16MHz.
The questions on my mind are:
Do the branches keep branching if the register becomes negative?
How exactly does one calculate the values that are loaded in the beginning?
ldi r18, 41
ldi r19, 150
ldi r20, 128
L1: dec r20
brne L1
dec r19
brne L1
dec r18
brne L1
To answer your questions exactly:
1: The DEC instruction doesn't know about 'signed' numbers, it just decrements an 8-bit register. The miracle of twos complement arithmetic makes this work at the wraparound (0x00 -> 0xFF, is the same bit pattern as 0 -> -1). The DEC instruction also sets the Z flag in the status register, which BRNE uses to determine if branching should happen.
2: You can see from the AVR manual that DEC is a single cycle instruction. BRNE is also a single cycle when not branching, and 2 cycles when branching. therefore to compute the time of your loop, you need to count the number of times each path will be taken.
Consider a single DEC/BRNE loop:
ldi r8 0
L1: dec r8
brne L1
This loop will execute exactly 256 times, which is 256 cycles of DEC, and 512 cycles of BRNE, for a total of 768 cycles. At 16MHz, that's 48us.
Wrapping that in an outer delay loop:
ldi r7 10
ldi r8 0
L1: dec r8
brne L1
dec r7
brne L1
You can see that the outer loop counter will decrement every time the inner loop counter hits 0. Thus in our example the outer loop DEC/BRNE will happen 10 times(for 768 cycles), and the inner loop will happen 10 x 256 times so the total time for this loop is 10 x 48us + 48us for 528us. Similarly for 3 nested loops.
From here, it's trivial to figure out how many times each loop should execute to achieve the desired delay. It's the largest number of iterations the outer loop can do less than the desired time, then taking that time out, do the same for the next nested loop, and so on until the inner most loop fills up the tiny amount left.
How exactly does one calculate the values that are loaded in the beginning?
Calculate total amount of cycles => 0.5s * 16000000 = 8000000
Know the total cycles of r20 and r19 loops (from zero to zero), AVR registers are 8 bit, so a full loop is 256 times (dec 0 = 255). dec is 1 cycle. brne is 2 cycles when condition (branch) happens, 1 cycle when not.
So the most inner loop:
L1: dec r20
brne L1
Is from zero to zero (r20=0): 255 * (1+2) + 1 * (1+1) = 767 cycles (255 times the branch is taken, 1 time it goes through).
The second wrapping loop working with r19 is then: 255 * (767+1+2) + 1 * (767+1+1) = 197119 cycles
The single r18 loop when branch is taken is then 197119+1+2 = 197122 cycles. (197121 when branch is not taken = final exit of delay loop, I will avoid this -1 by a trick in next step).
Now this is almost enough to calculate initial r18, let's adjust the total cycles first by the O(1) code, that's three times ldi instruction, which takes 1 cycle: total2 = 8000000 - (1+1+1) + 1 = 7999998 ... wait, what is the last +1 there? That's fake additional cycle to delay, to make the final r18 loop pretend it costs same as non-final, i.e. 197122 cycles.
And that's it, the initial r18 must be enough to wait at least 7999998 cycles: r18 = (7999998 + 197122 - 1) div 197122 = 41. The " + 197122 - 1" part will make sure the abundant cycles fits constraint: 0 <= abundant_cycles < 197122 (remainder by 197122 division).
41 * 197122 = 8082002 ... this is too much, but now we can shave the extra cycles down by setting up also r19 and r20 to particular values, to fine-tuned the delay. So how much is to be shaved off? 8082002 - 7999998 = 82004 cycles.
The single r19 loop takes 770 cycles when branching and 769 when exiting, so again let's avoid the 769 by adjusting 82004 to only 82003 to be shaved off. 82003 div 770 = 106: 106 r19 loops can be skipped, r19 = 256 - 106 = 150. Now this will shave 81620 cycles, so 82003 - 81620 = 383 cycles more to be shaved off.
The single r20 loop takes 3 cycles when branching and 2 when exiting. Again I will take into account the exiting loop being only 2 cycles -> 383 => 382 to shave off. And 382 div 3 = 127, remainder 1. r20 = 256 - 127 = 129 and do one less to shave additional 3 cycles (to cover that remainder) = 128. Then 2 cycles (3-1) wait is missing to make it a full 8mil.
So:
ldi r18, 41
ldi r19, 150
ldi r20, 128
L1: dec r20
brne L1
dec r19
brne L1
dec r18
brne L1
According to my calculations should wait exactly 8000000-2 cycles (if not interrupted by something else).
Let's try to verify:
Initial r20: 1273 + 12 = 383 cycles
Initial r19: 1*(383+1+2) + 148*(767+1+2) + 1*(767+1+1) = 115115 cycles
(that's initial r20 incomplete cycle one time, then 149 times full time r20 cycle with the final one being -1 due to exiting brne)
The r18 total: 1*(115115+1+2) + 39*(197119+1+2) + 1*(197119+1+1) = 7999997 cycles.
And the three ldi are +3 cycles = 7999997+3 = 8000000.
And the missing 2 cycles are nowhere to be seen, so I made somewhere a mistake.
As you can see, the math behind is reasonably simple, but very mundane to do by hand, and prone to mistakes...
Ah, I think I know where I did the mistake. When I'm shaving off the abundant cycles, the termination loop is not involved (that's part of the actual delay process), so I shouldn't have adjusted the to_shave_off cycles by -1. Then After r19 = 106 I would have still to shave off 384 cycles, and that's exactly 384/3 = 128 loops to shave off from r20 = 256-128 = 128. No remainder, no missing cycle, perfect 8mil.
If you have trouble to follow this reverse calculation, try it other way, imagine 2 bit registers (0..3 values only), and do on paper similar loop with r18=r19=r20=2, and count the cycles manually to see how it is evolving. .. i.e. 3x ldi = +3, dec r20,brne,dec r20,brne(skip) = +5 cycles, dec r19, brne = +3, ... etc.
Edit: and this was explained before by Jester in his links. And I'm too lazy to clean this up down to some simple formula to create your own online calculator.

Find the missing number in a group {0......2^k -1} range

Given an array that has the numbers {0......2^k -1} except for one number ,
find a good algorithm that finds the missing number.
Please notice , you can only use :
for A[i] return the value of bit j.
swap A[i] with A[j].
My answer : use divide & conquer , check the bit number K of all the numbers , if the K bit (now we're on the LSB) is 0 then move the number to the left side, if the K bit is 1 then move the number to the right side.
After the 1st iteration , we'd have two groups , where one of them is bigger than the other , so we continue to do the same thing, with the smaller group , and I think that I need to check the K-1 bit this time.
But from some reason I've tried with 8 numbers , from 0.....7 , and removed 4 (say that I want to find out that 4 is the missing number) , however to algorithm didn't work out so good. So where is my mistake ?
I assume you can build xor bit function using get bit j.
The answer will be (xor of all numbers)
PROOF: a xor (2^k-1-a) = 2^k-1 (a and (2^k-1-a) will have different bits in first k positions).
Then 0 xor 1 xor ... xor 2^k-1 = (0 xor 2^k-1) xor (1 xor 2^k-2).... (2^(k-1) pairs) = 0.
if number n is missing the result will be n, because 0 xor 1 xor 2....xor n-1 xor n+1 xor ... = 0 xor 1 xor 2....xor n-1 xor n+1 xor ... xor n xor n = 0 xor n = n
EDIT: This will not work if k = 1.
Ron,
your solution is correct. This problem smells Quicksort, doesn't it ?
What you do with the Kth bit (all 0's to the left, 1's to the right) is a called a partition - you need to find the misplaced elements in pairs and swap them. It's the process used in Hoare's Selection and in Quicksort, with special element classification - no need to use a pivot element.
You forgot to tell in the problem statement how many elements there are in the array (2^k-2 or more), i.e. if repetitions are allowed.
If repetitions are not allowed, every partition will indeed be imbalanced by one element. The algorithm to use is an instance of Hoare's Selection (only partition the smallest halve). At every partition stage, the number of elements to be considered is halved, hence O(N) running time. This is optimal since every element needs to be known before the solution can be found.
[If repetitions are allowed, use modified Quicksort (recursively partition both halves) until you arrive at an empty half. The running time is probably O(N Lg(N)) then, but this needs to be checked.]
You say that the algorithm failed on your test case: you probably mis-implemented some detail.
An example:
Start with
5132670 (this is range {0..7})
After partitioning on bit weight=4 you get
0132|675
where the shortest half is
675 (this is range {4..7})
After partitioning on bit weight=2, you get
5|67
where the shortest half is
5 (this is range {4..5})
After partitioning on bit weight=1, you get
|5
where the shortest half is empty (this is range {4}).
Done.
for n just add them all and subtract the result from n*(n+1)/2
n*(n+1)/2 is sum of 1...n all numbers. If one of them is missing, then sum of those n-1 numbers will be n*(n+1)/2-missingNumber
Your answer is: n*(n+1)/2-missingNumber where n is 2^k-1
Given the fact that for a given bit position j, there are exactly 2^(k-1) numbers which have it set to 0, and 2^(k-1) which have it set to 1 use the following algorithm.
start with an array B of boolean of size k
init the array to false everywhere
for each number A[i]
for each position j
get the value v
if v is 1 invert the boolean at position j
end for
end for
If a position is false at the end then the missing number does have a zero at
this position, otherwise it has a one (for k >1, If k = 1 then it is the inverse). Now to implement your array of booleans
create a number of size 2k, where the lower k are set to 0, and the upper
are set to 1. Then
invert the boolean at position j
is simply *
swap B[j] with B[j+k].
With this representation the missing number is the lower k elements of the array
B. Well this algorithm is still O(k*2^k) but you can say it is O(n*log(n))
of the input.
you can consider elements as string of k bits and at each step i if the number of ones or zeros in position i is 2^(k-i) you should remove all those strings an continue for example
100 111 010
101 110 000 011
so
100 111 101 110 all will be removed
and between 010 000 011 , 010 and 011 will be removed because their second bit is 1
000 remain and its rightmost bit is zero so 001 is the missing number

Resources