How do I count on picaxe? - loops

I am trying to count the number of times a button is pressed at input pin C.4 on a picaxe 14M2. I would then like to have a 'mode' that sets b.4 high for 5 seconds then low. This 'mode' needs to repeat the number of times you press the button before hand.
If this makes any sense, how would I do this?
Here is what I have so far...
init:
let b0 = 0
main:
low B.1
low B.2
low B.3
low B.4
low B.5
if pinC.4 = 1
let b0 = b0 +1
goto mode
Endif
goto main
mode:
high B.4
wait 5
low B.4
goto main

If I understand your question you want to first count a number of button presses, then output that number of 5 second pulses. But how will your program decide that you've finished your series of button presses, and want it to carry on and generate the sequence of pulses?
Here's a possible solution, but you'll have to decide if it's suitable and adapt it if not:
b0 = 0 ' initialise counting variable
w1 = 0 ' initialise timing variable (a 2-byte word)
countpresses:
pause 10 ' wait for 10 ms
w1 = w1 + 1 ' increment the timing variable
if pinC.4 = 0 then countpresses ' loop until button pressed
wait_release:
pause 10
w1 = w1 + 1 ' increment the timing variable
if pinC.4 = 1 then wait_release ' loop until button released
b0 = b0 + 1 ' increment the counter
if w1 < 200 then countpresses ' keep counting until 4 seconds have elapsed
if b0 > 0 then
for b1 = 1 to b0
high B.4
pause 5000 ' take B.4 high for 5 seconds
low B.4
pause 1000 ' and low for 1 second between pulses
next b1
endif
This will count how many times you press the button in a 4 second period (200 x 20 ms), then output that number of pulses. The pause statements make sure that you don't count 'bounces' of the switch contacts that might occur in the few milliseconds after the button is pressed or released, and the second loop makes sure that you only count once for each press rather than incrementing as fast as the PICAXE can go for as long as you hold the button down! You didn't say how long B.4 should go low for in between the 5 second high pulses - in the code above I've made that 1 second.
If that's not exactly what you want then it shouldn't be hard to figure out how to modify it, for example to wait for a number of seconds after the last time you release the button.
I've used a word variable for the timing counter so that the maximum time to wait isn't limited to 255 counts - you could change the 200 in the code to any value up to 65535 if you wanted (but you should have a think about what might happen if it got near that value). If you're a PICAXE beginner then do read the section of the manual about how byte and word variables relate to each other, which might not be obvious.

Related

How do AVR Assembly BRNE delay loops work?

An online delay loop generator gives me this delay loop of runtime of 0.5s for a chip running at 16MHz.
The questions on my mind are:
Do the branches keep branching if the register becomes negative?
How exactly does one calculate the values that are loaded in the beginning?
ldi r18, 41
ldi r19, 150
ldi r20, 128
L1: dec r20
brne L1
dec r19
brne L1
dec r18
brne L1
To answer your questions exactly:
1: The DEC instruction doesn't know about 'signed' numbers, it just decrements an 8-bit register. The miracle of twos complement arithmetic makes this work at the wraparound (0x00 -> 0xFF, is the same bit pattern as 0 -> -1). The DEC instruction also sets the Z flag in the status register, which BRNE uses to determine if branching should happen.
2: You can see from the AVR manual that DEC is a single cycle instruction. BRNE is also a single cycle when not branching, and 2 cycles when branching. therefore to compute the time of your loop, you need to count the number of times each path will be taken.
Consider a single DEC/BRNE loop:
ldi r8 0
L1: dec r8
brne L1
This loop will execute exactly 256 times, which is 256 cycles of DEC, and 512 cycles of BRNE, for a total of 768 cycles. At 16MHz, that's 48us.
Wrapping that in an outer delay loop:
ldi r7 10
ldi r8 0
L1: dec r8
brne L1
dec r7
brne L1
You can see that the outer loop counter will decrement every time the inner loop counter hits 0. Thus in our example the outer loop DEC/BRNE will happen 10 times(for 768 cycles), and the inner loop will happen 10 x 256 times so the total time for this loop is 10 x 48us + 48us for 528us. Similarly for 3 nested loops.
From here, it's trivial to figure out how many times each loop should execute to achieve the desired delay. It's the largest number of iterations the outer loop can do less than the desired time, then taking that time out, do the same for the next nested loop, and so on until the inner most loop fills up the tiny amount left.
How exactly does one calculate the values that are loaded in the beginning?
Calculate total amount of cycles => 0.5s * 16000000 = 8000000
Know the total cycles of r20 and r19 loops (from zero to zero), AVR registers are 8 bit, so a full loop is 256 times (dec 0 = 255). dec is 1 cycle. brne is 2 cycles when condition (branch) happens, 1 cycle when not.
So the most inner loop:
L1: dec r20
brne L1
Is from zero to zero (r20=0): 255 * (1+2) + 1 * (1+1) = 767 cycles (255 times the branch is taken, 1 time it goes through).
The second wrapping loop working with r19 is then: 255 * (767+1+2) + 1 * (767+1+1) = 197119 cycles
The single r18 loop when branch is taken is then 197119+1+2 = 197122 cycles. (197121 when branch is not taken = final exit of delay loop, I will avoid this -1 by a trick in next step).
Now this is almost enough to calculate initial r18, let's adjust the total cycles first by the O(1) code, that's three times ldi instruction, which takes 1 cycle: total2 = 8000000 - (1+1+1) + 1 = 7999998 ... wait, what is the last +1 there? That's fake additional cycle to delay, to make the final r18 loop pretend it costs same as non-final, i.e. 197122 cycles.
And that's it, the initial r18 must be enough to wait at least 7999998 cycles: r18 = (7999998 + 197122 - 1) div 197122 = 41. The " + 197122 - 1" part will make sure the abundant cycles fits constraint: 0 <= abundant_cycles < 197122 (remainder by 197122 division).
41 * 197122 = 8082002 ... this is too much, but now we can shave the extra cycles down by setting up also r19 and r20 to particular values, to fine-tuned the delay. So how much is to be shaved off? 8082002 - 7999998 = 82004 cycles.
The single r19 loop takes 770 cycles when branching and 769 when exiting, so again let's avoid the 769 by adjusting 82004 to only 82003 to be shaved off. 82003 div 770 = 106: 106 r19 loops can be skipped, r19 = 256 - 106 = 150. Now this will shave 81620 cycles, so 82003 - 81620 = 383 cycles more to be shaved off.
The single r20 loop takes 3 cycles when branching and 2 when exiting. Again I will take into account the exiting loop being only 2 cycles -> 383 => 382 to shave off. And 382 div 3 = 127, remainder 1. r20 = 256 - 127 = 129 and do one less to shave additional 3 cycles (to cover that remainder) = 128. Then 2 cycles (3-1) wait is missing to make it a full 8mil.
So:
ldi r18, 41
ldi r19, 150
ldi r20, 128
L1: dec r20
brne L1
dec r19
brne L1
dec r18
brne L1
According to my calculations should wait exactly 8000000-2 cycles (if not interrupted by something else).
Let's try to verify:
Initial r20: 1273 + 12 = 383 cycles
Initial r19: 1*(383+1+2) + 148*(767+1+2) + 1*(767+1+1) = 115115 cycles
(that's initial r20 incomplete cycle one time, then 149 times full time r20 cycle with the final one being -1 due to exiting brne)
The r18 total: 1*(115115+1+2) + 39*(197119+1+2) + 1*(197119+1+1) = 7999997 cycles.
And the three ldi are +3 cycles = 7999997+3 = 8000000.
And the missing 2 cycles are nowhere to be seen, so I made somewhere a mistake.
As you can see, the math behind is reasonably simple, but very mundane to do by hand, and prone to mistakes...
Ah, I think I know where I did the mistake. When I'm shaving off the abundant cycles, the termination loop is not involved (that's part of the actual delay process), so I shouldn't have adjusted the to_shave_off cycles by -1. Then After r19 = 106 I would have still to shave off 384 cycles, and that's exactly 384/3 = 128 loops to shave off from r20 = 256-128 = 128. No remainder, no missing cycle, perfect 8mil.
If you have trouble to follow this reverse calculation, try it other way, imagine 2 bit registers (0..3 values only), and do on paper similar loop with r18=r19=r20=2, and count the cycles manually to see how it is evolving. .. i.e. 3x ldi = +3, dec r20,brne,dec r20,brne(skip) = +5 cycles, dec r19, brne = +3, ... etc.
Edit: and this was explained before by Jester in his links. And I'm too lazy to clean this up down to some simple formula to create your own online calculator.

Summing up multiple variable scores depending on their score

tl;dr: I need to first dichotomize a set of variables to 0/1, then sum up these values. I need to do this for 14x8 variables, so I am looking for a way to to this in a loop.
Hi guys,
I have a very specific problem I need your help with:
Description of problem:
In my dataset I have 14 sets of 8 variables each (e.g. a1 to a8, b1 to b8, c1 to c8, etc.) with scores ranging from 1 to 6. Note that the variables are non-contiguous, with string variables in between them (which I need for a different purpose).
I know want to compute scores for each set of these variables (e.g. scoreA, scoreB, scoreC). The score should be computed according the following rule:
scoreA = 0.
If a1 > 1 then increment scoreA by 1.
If a2 > 1 then increment scoreA by 1.
... etc.
Example:
Dataset:
1 5 6 3 2 1 1 5
1 1 1 3 4 6 2 3
scores:
5
5
My previous attempts:
I know I could do this task by first recoding the variables to dichotomize them, and then sum up these values. This has two large drawbacks for me: Firstly it creates a lot of new variables which I don't need. Secondly it is a very tedious and repetitive task since I have multiple sets of variables (which have different variable names) with which I need to do the same task.
I took a look at the DO REPEAT and LOOP with VECTOR commands, but I seem to not fully understand how they work. I was not able to transfer solutions from other examples I read online to my problem.
I would be happy with a solution that only loops through one set of variables and does the task, then I would adjust the syntax appropriately for my other 13 sets of variables. Hope you can help me out.
See two solutions: one loops over each of the sets, the second is a macro which loops over a list of sets:
* creating some sample data.
DATA LIST list/a1 to a8 b1 to b8 c1 to c8 hello1 to hello8.
BEGIN DATA
1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 2 1 1 1 1 1 3 3 3 1 1 1 1 4 4 4 4
1 1 1 1 2 3 4 5 1 1 1 2 3 4 1 0 0 0 0 0 1 2 1 2 3 2 1 2 3 2 1 6
END DATA.
* solution 1: a loop for each set (example for sets a, b and c).
compute scoreA=0.
compute scoreB=0.
compute scoreC=0.
do repeat
a=a1 a2 a3 a4 a5 a6 a7 a8
/b=b1 b2 b3 b4 b5 b6 b7 b8
/c=c1 c2 c3 c4 c5 c6 c7 c8./* if variable names are consecutive replace with "a1 to a8" etc'.
compute scoreA=scoreA+(a>1).
compute scoreB=scoreB+(b>1).
compute scoreC=scoreC+(c>1).
end repeat.
execute.
Doing this for 14 different sets is no fun, so assuming your sets are always named $1 to $8, you can use the following macro:
define DoSets (SetList=!cmdend)
!do !set !in (!SetList)
compute !concat("Score_",!set)=0.
do repeat !set=!concat(!set,"1") !concat(!set,"2") !concat(!set,"3") !concat(!set,"4") !concat(!set,"5") !concat(!set,"6") !concat(!set,"7") !concat(!set,"8").
compute !concat("Score_",!set)=!concat("Score_",!set)+(!set>1).
end repeat.
!doend
execute.
!enddefine.
* now call the macro and list all set names.
DoSets SetList= a b c hello.
The do repeat loop above works perfectly, but with a lot of sets of variables, it would be tedious to create. Using Python programmability, this can be generated automatically without regard to the variable order. The code below assumes an unlimited number of variables with names of the form lowercase letter digit that occur in sets of 8 and generates and runs the do repeat. For simplicity it generates one loop for each output variable, but these will all be executed on a single data pass. If the name pattern is different, this code could be adjusted if you say what it is.
begin program.
import spss, spssaux
vars = sorted(spssaux.VariableDict(pattern="[a-z]\d").variables)
cmd = """compute %(score)s = 0.
do repeat index = %(vlist)s.
compute %(score)s = %(score)s + (index > 1).
end repeat."""
if len(vars) % 8 != 0:
raise ValueError("Number of input variables not a multiple of 8")
for v in range(0, len(vars),8):
score = "score" + vars[v][0]
vlist = " ".join(vars[v:v+8])
spss.Submit(cmd % locals())
end program.
execute.

PWM signal generation based on Mic input

I am using MPC 7555 controller. It has a 16 bit sigma delta ADC.
A signal called mic input is fed to this ADC pin. based upon the voltage , a PWM signal of same frequency of ADC signal sampling should be generated.
For e.g.
0.1 V = 2 percent
0.2 V = 4 percent
0.3 V = 6 percent....and so on
So, i thought the following logic -
5V - 0xFFFF in digital
0.1V - 1310
0.2V - 2620 and so on
So, dividing the digital value by 655 will give exact duty cycle value
1310/655 = 2
2620/655 = 4........
But digital pin could also show value of 1309 for 0.1 V which when divided by 655 would yield 1 and not 2.
Anyway i can avoid this or does any have a better solution, please share.
The task is to output PWM at the same rate as the ADC conversion rate.
Suppose the ADC conversion time is T (you can establish this by reading a free-run timer counter). And suppose the ADC conversion value is V. Then the PWM output time H spent "high" must be
H = T * V / 0xFFFF
Every time an ADC conversion is available, you (cancel any pending one-shot timer interrupt and) set the PWM output to 1 and trigger a one-shot timer at time H. When it interrupts, you set the PWM output to 0 (or the other way round if you have inverse logic).
If the input is 0x0000 or 0xFFFF you can employ an alternative strategy - set the output to 0 or 1, but don't deploy the one-shot timer.
To get the best fidelity in teh PWM signal, you would do better to work directly at the resolution of the PWM rather then calculate a percentage only to then convert that to a PWM count. Using integer percentage, you are effectively limiting your resolution to 6.64 bits per sample (i.e. log10(100)/log10(2)).
So let's say your PWM count per cycle is PWM_MAX, and your ADC maximum ADC_MAX, then the PWM high period would be:
pwm_high = adc_val * PWM_MAX / ADC_MAX ;
It is important to perform the multiplication first to avoid loss of information. If PWM_MAX is suficiently high, there is probably no need to worry about integer division rounding toward zero rather then to teh nearest integer, but if that is a concern (for low PWM_MAX ) then:
pwm_high = ((adc_val * PWM_MAX) + (ADC_MAX / 2)) / ADC_MAX ;
For example, soy your PWM_MAX is only 100 (i.e. the resolution truely is in integer percent), then in the first case:
pwm_high = 1310 * 100 / 0xFFFF = 1
and in the second:
pwm_high = ((1310 * 100) + 0x7FFF) / 0xFFFF = 2
However if PWM_MAX is a more suitable 4096 perhaps, then:
pwm_high = 1310 * 4096 / 0xFFFF = 81
or
pwm_high = ((1310 * 4096) + 0x7fff) / 0xFFFF = 82
With PWM_MAX at 4096 you have effectively 12 bits of resolution and will maintain much higher fidelity as well as directly calculating the correct PWM value.

comparing and interpreting two time counters

I have two counters that denote time.One of them is a 64-bit counter which should be interpreted as follows.
1) Most significant 32 bits indicate the number of seconds since a fixed point in time
2) Lower 32 bits indicate a fraction of a second.
I don't know how to interpret the other 48 bit counter.?
What I do know is this.
What COUNTER 1 increased by 508032,COUNTER 2 increased by 5914.
Meaning COUNTER1 (time 2 - time 1) = 508032
during that time
COUNTER2 (time 2 - time 1) = 5914.
I need a formula to figure out how to interpret the resolution of COUNTER2.
Counter 2 increments 50 million times per second.
5914/(508032/2^32) ~= 50 million
To put it in words: Counter 1 increments 2^32 times per second. So if it increments 508,032 times, that's 1/8454 of a second. In that time, counter 2 incremented 5,914 times. So it would increment 5,914*8,454 times in a full second. That's so close to 50 million (49.997 million) that it's almost certain counter 2's resolution is intended to be 50 million counts per second.

How to optimise this Langton's ant sim?

I'm writing a Langton's ant sim (for rulestring RLR) and am trying to optimise it for speed. Here's the pertinent code as it stands:
#define AREA_X 65536
#define AREA_Y 65536
#define TURN_LEFT 3
#define TURN_RIGHT 1
int main()
{
uint_fast8_t* state;
uint_fast64_t ant=((AREA_Y/2)*AREA_X) + (AREA_X/2);
uint_fast8_t ant_orientation=0;
uint_fast8_t two_pow_five=32;
uint32_t two_pow_thirty_two=0;/*not fast, relying on exact width for overflow*/
uint_fast8_t change_orientation[4]={0, TURN_RIGHT, TURN_LEFT, TURN_RIGHT};
int_fast64_t* move_ant={AREA_X, 1, -AREA_X, -1};
... initialise empty state
while(1)
{
while(two_pow_five--)/*removing this by doing 32 steps per inner loop, ~16% longer*/
{
while(--two_pow_thirty_two)
{
/*one iteration*/
/* 54 seconds for init + 2^32 steps
ant_orientation = ( ant_orientation + (117>>((++state[ant])*2 )) )&3;
state[ant] = (36 >> (state[ant] *2) ) & 3;
ant+=move_ant[ant_orientation];
*/
/* 47 seconds for init + 2^32 steps
ant_orientation = ( ant_orientation + ((state[ant])==1?3:1) )&3;
state[ant] += (state[ant]==2)?-2:1;
ant+=move_ant[ant_orientation];
*/
/* 46 seconds for init + 2^32 steps
ant_orientation = ( ant_orientation + ((state[ant])==1?3:1) )&3;
if(state[ant]==2)
{
--state[ant];
--state[ant];
}
else
++state[ant];
ant+=move_ant[ant_orientation];
*/
/* 44 seconds for init + 2^32 steps
ant_orientation = ( ant_orientation + ((++state[ant])==2?3:1) )&3;
if(state[ant]==3)state[ant]=0;
ant+=move_ant[ant_orientation];
*/
// 37 seconds for init + 2^32 steps
// handle every situation with nested switches and constants
switch(ant_orientation)
{
case 0:
switch(state[ant])
{
case 0:
ant_orientation=1;
state[ant]=1;
++ant;
break;
case 1:
ant_orientation=3;
state[ant]=2;
--ant;
break;
case 2:
ant_orientation=1;
state[ant]=0;
++ant;
break;
}
break;
case 1:
switch(state[ant])
{
...
}
break;
case 2:
switch(state[ant])
{
...
}
break;
case 3:
switch(state[ant])
{
...
}
break;
}
}
}
two_pow_five=32;
... dump to file every 2^37 steps
}
return 0;
}
I have two questions:
I've tried to optimise as best as I can with c by trial and error testing, are there any tricks I haven't taken advantage of? Please try to talk in c not assembly, although I'll probably try assembly at some point.
Is there a better way to model the problem to increase speed?
More info: Portability doesn't matter. I'm on 64 bit linux, using gcc, an i5-2500k and 16 GB of ram. The state array as it stands uses 4GiB, the program could feasibly use 12GiB of ram. sizeof(uint_fast8_t)=1. Bounds checks are intentionally not present, corruption is easy to spot manually from the dumps.
edit: Perhaps counter-inuitively, piling on the switch statements instead of eliminating them has yielded the best efficiency so far.
edit: I've re-modelled the problem and come up with something quicker than a single step per iteration. Before, each state element used two bits and described a single cell in the Langton's ant grid. The new way uses all 8 bits, and describes a 2x2 section of the grid. Every iteration a variable number of steps are done, by looking up pre-computed values of step count, new orientation and new state for the current state+orientation. Assuming everything is equally likely it averages to 2 steps taken per iteration. As a bonus it uses 1/4 of the memory to model the same area:
while(--iteration)
{
// roughly 31 seconds per 2^32 steps
table_offset=(state[ant]*24)+(ant_orientation*3);
it+=twoxtwo_table[table_offset+0];
state[ant]=twoxtwo_table[table_offset+2];
ant+=move_ant2x2[(ant_orientation=twoxtwo_table[table_offset+1])];
}
Haven't tried optimising it yet, the next thing to try is eliminating the offset equation and lookups with nested switches and constants like before (but with 648 inner cases instead of 12).
Or, you can use a single unsigned byte constant as an artificial register instead of branching:
value: 1 3 1 1
bits: 01 11 01 01 ---->101 decimal value for an unsigned byte
index 3 2 1 0 ---> get first 2 bits to get "1" (no shift)
--> get second 2 bits to get "1" (shifting for 2 times)
--> get third 2 bits to get "3" (shifting for 4 times)
--> get last 2 bits to get "1" (shifting for 6 times)
Then "AND" the result with binary(11) or decimal(3) to get your value.
(101>>( (++state[ant])*2 ) ) & 3 would give you the turnright or turnleft
Example:
++state[ant]= 0: ( 101>>( (0)*2 ) )&3 --> 101 & 3 = 1
++state[ant]= 1: ( 101>>( (1)*2 ) )&3 --> 101>>2 & 3 = 1
++state[ant]= 2: ( 101>>( (2)*2 ) )&3 --> 101>>4 & 3 = 3 -->turn left
++state[ant]= 3: ( 101>>( (3)*2 ) )&3 --> 101>>6 & 3 = 1
Maximum six-shifting + one-multiplication + one-"and" may be better.
Dont forget constant can be auto-promoted so you may add some suffixes or something else.
Since you are using "unsigned int" for the %4 modulus, you can use "and" operation.
state[ant]=state[ant]&3; instead of state[ant]=state[ant]%4;
For unskilled compilers, this should increase speed.
The hardest part: modulo-3
C = A % B is equivalent to C = A – B * (A / B)
We need state[ant]%3
Result = state[ant] - 3 * (state[ant]/3)
state[ant]/3 is always <=1 for your valid direction states.
Only when state[ant] is 3 then state[ant]/3 is 1, other values give 0.
When multiplied by 3, that part is 0 or 3 (only 3 when state[ant] is 3 otherwise 0)
Result = state[ant] - (0 or 3)
Lets look at all possibilities:
state[ant]=0: 0 - 0 ---> 0 ----> 00100100 shifted by 0 times &3 --> 00000000
state[ant]=1: 1 - 0 ---> 1 ----> 00100100 shifted by 2 times &3 --> 00000001
state[ant]=2: 2 - 0 ---> 2 ----> 00100100 shifted by 4 times &3 --> 00000010
state[ant]=3: 3 - 3 ---> 0 ----> 00100100 shifted by 6 times &3 --> 00000000
00100100 is 36 in decimal.
(36 >> (state[ant] *2) ) & 3 will give you state[ant]%3 for your valid states (0,1,2,3)
Example:
state[ant]=0: 36 >> 0 --> 36 ----> 36& 3 ----> 0 satisfies 0%3
state[ant]=1: 36 >> 2 --> 9 -----> 9 & 3 ----> 1 satisfies 1%3
state[ant]=2: 36 >> 4 --> 2 -----> 2 & 3 ----> 2 satisfies 2%3
state[ant]=3: 36 >> 6 --> 0 -----> 0 & 3 ----> 0 satisfies 3%3

Resources