Related
Ok so I am trying to ditch energia texas instruments arduino style ide and I have used IAR for coding a Tiva C series development board where I was able to use pointers to memory locations to perform specific things like toggling led for example. I have had a hard time doing the same on a dev board running a MSP430FR5994 mcu, I know the memory address of the green led pin to be PORT 1 PIN 1 OR P1.1 on the board. I also have included the msp430.h header file for an api to the board from my ide. What I don't understand is why when in debug my code is changing the value of the correct registers to the correct numbers but it is not altering the board. I have also verified that it is connected to the board as it will not proceed to debug with it unplugged. My direct questions are this: 1 I should be able to alter memory locations with no headerfiles or any special api's as long as I know the specific addresses correct? 2 I did not see anything about clock gating in the data sheet and in debug I can see those registers changing values so is there something other than setting the pin direction and value that I need to do?( the default pin function is generic gpio I checked so I left that register alone. Any ideas or pointing out obvious errors in my approach would be very helpful thanks. In the code below I used the header file names as I could not get the direct pointers to work. Also I was confused by the data sheet as the base address for port 1 was written as 0200H which is 5 hex numbers when I was expecting 4 since the chip is 16bit system? I assumed with the offsets it meant 0x202H etc am I incorrect in this assumption?
registers during debugging image
ti.com/lit/ds/symlink/msp430fr5994.pdf (datasheet port 1 mem locations page 130)
#include <msp430.h>
/**
* main.c
*/
int main(void)
{
WDTCTL = WDTPW | WDTHOLD; // stop watchdog timer
while(1){
int i ;
int j ;
P1DIR = 2;
//*((unsigned int *)0x204Hu) = 2;
P1OUT = 2;
//*((unsigned int *)0x202Hu)= 2;
for( i = 0; i< 2 ; i++){}
P1OUT = 0;
//*((unsigned int *)0x202Hu)= 0;
for (j = 0 ; j< 2; j++){}
}
return 0;
}
See section 12.3.1 of the MSP430FR59xx User's Guide.
After a BOR reset, all port pins are high-impedance with Schmitt
triggers and their module functions disabled to prevent any cross
currents. The application must initialize all port pins including
unused ones (Section 12.3.2) as input high impedance, input with
pulldown, input with pullup, output high, or output low according to
the application needs by configuring PxDIR, PxREN, PxOUT, and PxIES
accordingly. This initialization takes effect as soon as the LOCKLPM5
bit in the PM5CTL register (described in the PMM chapter) is cleared;
until then, the I/Os remain in their high-impedance state with Schmitt
trigger inputs disabled.
And here is the example blinky code provided in the MSP430FR599x Code Examples available to download from here.
#include <msp430.h>
int main(void)
{
WDTCTL = WDTPW | WDTHOLD; // Stop WDT
// Configure GPIO
P1OUT &= ~BIT0; // Clear P1.0 output latch for a defined power-on state
P1DIR |= BIT0; // Set P1.0 to output direction
PM5CTL0 &= ~LOCKLPM5; // Disable the GPIO power-on default high-impedance mode
// to activate previously configured port settings
while(1)
{
P1OUT ^= BIT0; // Toggle LED
__delay_cycles(100000);
}
}
You probably need to add that PM5CTL0 &= ~LOCKLPM5; line to your code.
And then single-step through your code in the debugger to observe the LED. Because if you let your code run at full speed the delay loops are way too short to observe the LED flash with your eye.
This is derived from one of my examples; I don't have a card handy to test it on, but it should just work or be close.
startup.s
.word hang /* 0xFFE0 */
.word hang /* 0xFFE2 */
.word hang /* 0xFFE4 */
.word hang /* 0xFFE6 */
.word hang /* 0xFFE8 */
.word hang /* 0xFFEA */
.word hang /* 0xFFEC */
.word hang /* 0xFFEE */
.word hang /* 0xFFF0 */
.word hang /* 0xFFF2 */
.word hang /* 0xFFF4 */
.word hang /* 0xFFF6 */
.word hang /* 0xFFF8 */
.word hang /* 0xFFFA */
.word hang /* 0xFFFC */
.word reset /* 0xFFFE */
reset.s
.global reset
reset:
mov #0x03FF,r1
call #notmain
jmp hang
.global hang
hang:
jmp hang
.globl dummy
dummy:
ret
so.c
void dummy ( unsigned short );
#define WDTCTL (*((volatile unsigned short *)0x015C))
#define P1OUT (*((volatile unsigned short *)0x0202))
#define P1DIR (*((volatile unsigned short *)0x0204))
void notmain ( void )
{
unsigned short ra;
WDTCTL = 0x5A80;
P1DIR|=0x02;
while(1)
{
P1OUT |= 0x0002;
for(ra=0;ra<10000;ra++) dummy(ra);
P1OUT &= 0xFFFD;
for(ra=0;ra<10000;ra++) dummy(ra);
}
}
memmap
MEMORY
{
rom : ORIGIN = 0xC000, LENGTH = 0xFFE0-0xC000
ram : ORIGIN = 0x1C00, LENGTH = 0x2C00-0x1C00
vect : ORIGIN = 0xFFE0, LENGTH = 0x20
}
SECTIONS
{
VECTORS : { startup.o } > vect
.text : { *(.text*) } > rom
.bss : { *(.bss*) } > ram
.data : { *(.data*) } > ram
}
build
msp430-gcc -Wall -O2 -c so.c -o so.o
msp430-ld -T memmap reset.o so.o startup.o -o so.elf
msp430-objdump -D so.elf > so.list
msp430-objcopy -O ihex so.elf out.hex
examine output
so.elf: file format elf32-msp430
Disassembly of section VECTORS:
0000ffe0 <VECTORS>:
ffe0: 0a c0 bic r0, r10
ffe2: 0a c0 bic r0, r10
ffe4: 0a c0 bic r0, r10
ffe6: 0a c0 bic r0, r10
ffe8: 0a c0 bic r0, r10
ffea: 0a c0 bic r0, r10
ffec: 0a c0 bic r0, r10
ffee: 0a c0 bic r0, r10
fff0: 0a c0 bic r0, r10
fff2: 0a c0 bic r0, r10
fff4: 0a c0 bic r0, r10
fff6: 0a c0 bic r0, r10
fff8: 0a c0 bic r0, r10
fffa: 0a c0 bic r0, r10
fffc: 0a c0 bic r0, r10
fffe: 00 c0 bic r0, r0
Disassembly of section .text:
0000c000 <reset>:
c000: 31 40 ff 03 mov #1023, r1 ;#0x03ff
c004: b0 12 0e c0 call #0xc00e
c008: 00 3c jmp $+2 ;abs 0xc00a
0000c00a <hang>:
c00a: ff 3f jmp $+0 ;abs 0xc00a
0000c00c <dummy>:
c00c: 30 41 ret
0000c00e <notmain>:
c00e: 0b 12 push r11
c010: b2 40 80 5a mov #23168, &0x015c ;#0x5a80
c014: 5c 01
c016: a2 d3 04 02 bis #2, &0x0204 ;r3 As==10
c01a: a2 d3 02 02 bis #2, &0x0202 ;r3 As==10
c01e: 0b 43 clr r11
c020: 0f 4b mov r11, r15
c022: b0 12 0c c0 call #0xc00c
c026: 1b 53 inc r11
c028: 3b 90 10 27 cmp #10000, r11 ;#0x2710
c02c: f9 23 jnz $-12 ;abs 0xc020
c02e: b2 f0 fd ff and #-3, &0x0202 ;#0xfffd
c032: 02 02
c034: 0b 43 clr r11
c036: 0f 4b mov r11, r15
c038: b0 12 0c c0 call #0xc00c
c03c: 1b 53 inc r11
c03e: 3b 90 10 27 cmp #10000, r11 ;#0x2710
c042: f9 23 jnz $-12 ;abs 0xc036
c044: ea 3f jmp $-42 ;abs 0xc01a
looks fine the vector table is there and points to the right place, etc.
10,000 might not be enough to see the led blink.
From the datasheet it appears that 0x202 is P1OUT and 0x204 is P1DIR
And you have to get it programmed into the board. I use mspdebug for the boards I have but that program may have stopped working on the eval boards from TI a while ago. And mspdebug supported the Intel hex format. So use objcopy for other formats.
If you were not wanting to use gnu tools then you still have to deal with the vector table and the bootstrap in front of the C code if you are trying to get away from someone's sandbox and do your own thing.
You are on the right path it may be as simple as your delay loops are way way too small and or getting optimized out since they are dead code as written.
If you rely on .bss or .data being initialized then you have more work to do in the linker script and bootstrap. I don't, so don't have that problem, actually wondering why .data was in this linker script...
I've matched addresses to the datasheet for your part, increasing the odds of success. If you use an external function and pass the loop variable to that function (the external can be C or asm, doesn't matter) then the optimizer won't remove it as dead code. That or add volatile on the loop variable and check the disassembly to see that it wasn't removed.
I made a program with atmega16 and was trying to make my own delay_us() so I looked into the compiler avr-gcc library for _delay_us() function and this is its code:
static inline void _delay_us(double __us) __attribute__((always_inline));
/*
\ingroup util_delay
Perform a delay of \c __us microseconds, using _delay_loop_1().
The macro F_CPU is supposed to be defined to a
constant defining the CPU clock frequency (in Hertz).
The maximal possible delay is 768 us / F_CPU in MHz.
If the user requests a delay greater than the maximal possible one,
_delay_us() will automatically call _delay_ms() instead. The user
will not be informed about this case.
*/
void
_delay_us(double __us)
{
uint8_t __ticks;
double __tmp = ((F_CPU) / 3e6) * __us; //number of ticks per us * delay time in us
if (__tmp < 1.0)
__ticks = 1;
else if (__tmp > 255)
{
_delay_ms(__us / 1000.0);
return;
}
else
__ticks = (uint8_t)__tmp;
_delay_loop_1(__ticks); // function decrements ticks untill it reaches 0( takes 3 cycles)
}
I got confused about how, if I use a 1Mhz clock, this function that contains floating point arithmetic will be able to make small delays (like _delay_us(10)), because executing all the the setup code will definitely take more time than that. So I wrote this program:
#include <avr/io.h>
#include <avr/interrupt.h>
#include <util/delay.h>
#define F_CPU 1000000UL
int main()
{
_delay_ms(1000);
DDRB=0XFF;
PORTB=0XFF;
_delay_us(10);
PORTB=0;
for(;;){}
return 0;
}
I simulated it using protues and used an oscilloscope and connected one of PORTB pins to its input. Then I saw that the delay was exactly 10 us. How could the delay be that accurate considering this set up code statement that uses floating point arithmetic:
double __tmp = ((F_CPU) / 4e3) * __ms;
This should have taken a lot of cycles that makes _delay_us(10) exceed the 10 us period, but the time was exactly 10us!!
All the floating point arithmetic is calculated by the preprocessor as these delay functions are actually macros. So at the point the mcu executes the code, all that's left is a loop that uses an integer to do the delay.
typedef unsigned char uint8_t;
#define F_CPU 16000000
extern void _delay_loop_1( uint8_t );
static void _delay_us(double __us)
{
uint8_t __ticks;
double __tmp = ((F_CPU) / 3e6) * __us;
if (__tmp < 1.0)
{
__ticks = 1;
}
else
{
if (__tmp > 255)
{
_delay_ms(__us / 1000.0);
return;
}
else
{
__ticks = (uint8_t)__tmp;
}
}
_delay_loop_1(__ticks);
}
void fun1 ( void )
{
_delay_us(10);
}
which with gcc can produce this:
00000000 <fun1>:
0: 85 e3 ldi r24, 0x35 ; 53
2: 00 c0 rjmp .+0 ; 0x4 <__zero_reg__+0x3>
the number to feed _delay_loop_1 is computed at compile time not runtime, all of that dead code goes away.
but add this:
void fun2 ( void )
{
uint8_t ra;
for(ra=1;ra<10;ra++) _delay_us(ra);
}
and things change dramatically.
00000000 <fun1>:
0: 85 e3 ldi r24, 0x35 ; 53
2: 00 c0 rjmp .+0 ; 0x4 <fun2>
00000004 <fun2>:
4: 8f 92 push r8
6: 9f 92 push r9
8: af 92 push r10
a: bf 92 push r11
c: cf 92 push r12
e: df 92 push r13
10: ef 92 push r14
12: ff 92 push r15
14: cf 93 push r28
16: c1 e0 ldi r28, 0x01 ; 1
18: 6c 2f mov r22, r28
1a: 70 e0 ldi r23, 0x00 ; 0
1c: 80 e0 ldi r24, 0x00 ; 0
1e: 90 e0 ldi r25, 0x00 ; 0
20: 00 d0 rcall .+0 ; 0x22 <fun2+0x1e>
22: 86 2e mov r8, r22
24: 97 2e mov r9, r23
26: a8 2e mov r10, r24
28: b9 2e mov r11, r25
2a: 2b ea ldi r18, 0xAB ; 171
2c: 3a ea ldi r19, 0xAA ; 170
2e: 4a ea ldi r20, 0xAA ; 170
30: 50 e4 ldi r21, 0x40 ; 64
32: 00 d0 rcall .+0 ; 0x34 <fun2+0x30>
34: c6 2e mov r12, r22
36: d7 2e mov r13, r23
38: e8 2e mov r14, r24
3a: f9 2e mov r15, r25
3c: 20 e0 ldi r18, 0x00 ; 0
3e: 30 e0 ldi r19, 0x00 ; 0
40: 40 e8 ldi r20, 0x80 ; 128
42: 5f e3 ldi r21, 0x3F ; 63
44: 00 d0 rcall .+0 ; 0x46 <fun2+0x42>
46: 87 fd sbrc r24, 7
48: 00 c0 rjmp .+0 ; 0x4a <fun2+0x46>
4a: 20 e0 ldi r18, 0x00 ; 0
4c: 30 e0 ldi r19, 0x00 ; 0
4e: 4f e7 ldi r20, 0x7F ; 127
50: 53 e4 ldi r21, 0x43 ; 67
52: 9f 2d mov r25, r15
54: 8e 2d mov r24, r14
56: 7d 2d mov r23, r13
58: 6c 2d mov r22, r12
5a: 00 d0 rcall .+0 ; 0x5c <fun2+0x58>
5c: 18 16 cp r1, r24
5e: 04 f0 brlt .+0 ; 0x60 <fun2+0x5c>
60: 9f 2d mov r25, r15
62: 8e 2d mov r24, r14
64: 7d 2d mov r23, r13
66: 6c 2d mov r22, r12
68: 00 d0 rcall .+0 ; 0x6a <fun2+0x66>
6a: 86 2f mov r24, r22
6c: 00 d0 rcall .+0 ; 0x6e <fun2+0x6a>
6e: cf 5f subi r28, 0xFF ; 255
70: ca 30 cpi r28, 0x0A ; 10
72: 01 f4 brne .+0 ; 0x74 <fun2+0x70>
74: cf 91 pop r28
76: ff 90 pop r15
78: ef 90 pop r14
7a: df 90 pop r13
7c: cf 90 pop r12
7e: bf 90 pop r11
80: af 90 pop r10
82: 9f 90 pop r9
84: 8f 90 pop r8
86: 08 95 ret
88: 20 e0 ldi r18, 0x00 ; 0
8a: 30 e0 ldi r19, 0x00 ; 0
8c: 4a e7 ldi r20, 0x7A ; 122
8e: 54 e4 ldi r21, 0x44 ; 68
90: 9b 2d mov r25, r11
92: 8a 2d mov r24, r10
94: 79 2d mov r23, r9
96: 68 2d mov r22, r8
98: 00 d0 rcall .+0 ; 0x9a <fun2+0x96>
9a: 00 d0 rcall .+0 ; 0x9c <fun2+0x98>
9c: 00 c0 rjmp .+0 ; 0x9e <fun2+0x9a>
9e: 81 e0 ldi r24, 0x01 ; 1
a0: 00 c0 rjmp .+0 ; 0xa2 <__SREG__+0x63>
hmm, how good is the optimizer?
void fun3 ( void )
{
uint8_t ra;
for(ra=20;ra<22;ra++) _delay_us(ra);
}
thought so
00000004 <fun3>:
4: 8a e6 ldi r24, 0x6A ; 106
6: 00 d0 rcall .+0 ; 0x8 <fun3+0x4>
8: 80 e7 ldi r24, 0x70 ; 112
a: 00 c0 rjmp .+0 ; 0xc <fun3+0x8>
figured the count to 10 would have done it.
A lot of times you will see delay loop functions like this get used with hardcoded values because the spec for whatever you are bit banging, etc has those values and it is easy when the thing says pop reset then wait 100us, you just call a delay with 100 in it. Now if there is one file with:
fun4(10);
and another file (another optimization domain) you have the above with this added:
void fun4 ( uint8_t x)
{
_delay_us(x);
}
you can then understand where this is headed...runtime...dont even need to compile it to see that it will have the problem. Now some compilers like llvm you can optimize across file domains, but they dont target the AVR, their MSP430 was a publicity stunt more than reality as it doesnt work and is not supported. Their arm support is obviously good, but they change their command line options pretty much every minor release, I have long since gotten tired trying to use them as I have to constantly change my makefiles to keep up, and their optimized code is sadly as not as fast as gccs despite gccs code getting worse every release and llvm getting a little better (worse/better are in the eyes of the beholder of course).
Initially I need to send and receive serially some data. The packet length is 48 bits.
For shorter packets (32 bits) I could do something like that:
unsigned long data=0x12345678;
for(i=0;i<32;i++){
if(data & 0x80000000)
setb_MOD;
else
clrb_MOD;
data <<= 1;
}
This code compilation is really pleasing me:
code<<=1;
ac: 88 0f add r24, r24
ae: 99 1f adc r25, r25
b0: aa 1f adc r26, r26
b2: bb 1f adc r27, r27
b4: 80 93 63 00 sts 0x0063, r24
b8: 90 93 64 00 sts 0x0064, r25
bc: a0 93 65 00 sts 0x0065, r26
c0: b0 93 66 00 sts 0x0066, r27
After I needed to extend the packet (to 48 bits) I faced with the need to shift an array:
unsigned char data[6]={0x12,0x34,0x56,0x78,0xAB,0xCD};
for(i=0;i<48;i++){
if(data[5] & 0x80)
setb_MOD;
else
clrb_MOD;
for(j=5;j>0;j--){
data[j]<<=1;
if(data[j-1] & 0x80)
data[j]+=1;
}
data[0] <<= 1;
}
The compilled code is slightly depends on an optimization settings but generally it is doing what I commanded in C:
for(j=5;j>0;j--){
code[j]<<=1;
a8: 82 91 ld r24, -Z
aa: 88 0f add r24, r24
ac: 80 83 st Z, r24
if(code[j-1]&0x80)
ae: 9e 91 ld r25, -X
b0: 97 fd sbrc r25, 7
b2: 13 c0 rjmp .+38 ; 0xda <__vector_2+0x74>
clrb_MOD;
}
else{
setb_MOD;
}
for(j=5;j>0;j--){
b4: 80 e0 ldi r24, 0x00 ; 0
b6: a3 36 cpi r26, 0x63 ; 99
b8: b8 07 cpc r27, r24
ba: b1 f7 brne .-20 ; 0xa8 <__vector_2+0x42>
code[j]<<=1;
if(code[j-1]&0x80)
code[j]+=1;
}
As you can see there is no obvious (for a human) solution to shift an array byte after byte.
I'd like to skip injection of inline assembler as I don't really manage this technique and I don't really understand how do I address C variables in Asm. Is there any alternatives?
If you know your input is less than 64 bits, you can do something like (assuming stdint.h is available, otherwise convert to unsigned long long, etc):
union BitShifter
{
uint64_t u64;
uint32_t u32[2];
uint16_t u16[4];
uint8_t u8[8];
};
union BitShifter MyBitshifter;
MyBitShifter.u64 <<= 1;
The compiler should use the best instruction to accomplish that (probably two 32 bit shifts and some other logic to get the bit from one word to another. Of course, the backend might be lazy and do it as bytes...
Depending on the endianness of the AVR, you'll have to swizzle your bytes in the right order to the outgoing bit order correct.
I am having trouble with the creation and addressing of an array created purely in assembly using the instruction set for the Atmel ATMega8535.
What I understand so far is as follows:
The array contains contiguous data that is equal in length.
The creation of the array involves defining the beginning and end locations of the array (much like you would the stack).
You would address an index in the array by adding an offset of the base address of the array.
What I am looking to do specifically is create a 1-D array of 8-bit integers with predefined values populating it during initialization it does not have to be written to, only addressed when needed. The problem ultimately lying in not being able to translate the logic into the assembly code.
I have tried with little progress to do so using support from the following books:
Some Assembly Required: Assembly Language Programming with the AVR Microcontroller by Timothy S Margush
Get Going with...AVR Microcontrollers by Peter Sharpe
Any help, advice or further resources would be greatly appreciated.
If your array is read-only, you do not need to copy it to RAM. You can
keep it in Flash and read it from there when needed. This will save you
precious RAM, at the cost of slower access (read from RAM is 2 cycles,
read from flash is 3 cycles).
You can declare your array like this:
.global my_array
.type my_array, #object
my_array:
.byte 12, 34, 56, 78
Then, to read a member of the array, you have to compute:
adress of member = array base address + member index
If your members were more than one byte, you would have to also multiply
the index by the size, but this is not the case here. Then, you put the
address of the required member in the Z register and issue an lpm
instruction. Here is a function implementing this logic:
.global read_data
; input: r24 = array index, r1 = 0
; output: r24 = array value
; clobbers: r30, r31
read_data:
ldi r30, lo8(my_array) ; load Z = address of my_array
ldi r31, hi8(my_array) ; ...high byte also
add r30, r24 ; add the array index
adc r31, r1 ; ...and add 0 to propagate the carry
lpm r24, Z
ret
#scottt advised you to first write in C, then look at the generated
assembly. I consider this very good advice, let's follow it:
#include <stdint.h>
__flash const uint8_t my_array[] = {12, 34, 56, 78};
uint8_t read_data(uint8_t index)
{
return my_array[index];
}
The __flash keyword identifying a “named address space” is an embedded
C extension supported by
gcc. The
generated assembly is slightly different from the previous one: instead
of computing base_address + index, gcc does index − (−base_address):
read_data:
mov r30, r24 ; load Z = array index
ldi r31, 0 ; ...high byte of index is 0
subi r30, lo8(-(my_array)) ; subtract -(address of my array)
sbci r31, hi8(-(my_array)) ; ...high byte also
lpm r24, Z
ret
This is just as efficient as the previous hand-rolled assembly, except
that it does not need the r1 register to be initialized to zero. But
keeping r1 to zero is part of the gcc ABI anyway, so it should make no
difference.
The role of the linker
This section is meant to answer the question in the comment: how can we
access the array if we do not know its address? The answer is: we access
it by its name, just like in the code snippets above. Choosing the final
address for the array, as well as replacing the name by the appropriate
address, is the linker’s job.
Assembling (with avr-gcc -c) and disassembling (with avr-objdump -d)
the first code snippet gives this:
my_array.o, section .text:
00000000 <my_array>:
0: 0c 22 38 4e ."8N
If we were compiling from C, gcc would have put the array in the
.progmem.data section instead of .text, but it makes little difference.
The numbers “0c 22 38 4e” are the array contents, in hex. The characters
to the right are the ASCII equivalents, ‘.’ being the placeholder for
non printing characters.
The object file also carries this symbol table, shown by avr-nm:
my_array.o:
00000000 T my_array
meaning the symbol “my_array” has been defined as referring to offset 0
into the .text section (implied by “T”) of this object.
Assembling and disassembling the second code snippet gives this:
read_data.o, section .text:
00000000 <read_data>:
0: e0 e0 ldi r30, 0x00
2: f0 e0 ldi r31, 0x00
4: e8 0f add r30, r24
6: f1 1d adc r31, r1
8: 84 91 lpm r24, Z
a: 08 95 ret
Comparing the disassembly with the actual source code, it can be seen
that the assembler replaced the address of my_array with 0x00, which is
almost guaranteed to be wrong. But it also left a note to the linker in
the form of “relocation records”, shown by avr-objdump -r:
read_data.o, RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
00000000 R_AVR_LO8_LDI my_array
00000002 R_AVR_HI8_LDI my_array
This tells the linker that the ldi instructions at offsets 0x00 and
0x02 are intended to load the low byte and the high byte (respectively)
of the final address of my_array. The object file also carries this
symbol table:
read_data.o:
U my_array
00000000 T read_data
where the “U” line means the file makes use of an undefined symbol named
“my_array”.
Linking these pieces together, with a suitable main(), yields a binary
containing the C runtime from avr-lbc, together with our code:
0000003c <my_array>:
3c: 0c 22 38 4e ."8N
00000040 <read_data>:
40: ec e3 ldi r30, 0x3C
42: f0 e0 ldi r31, 0x00
44: e8 0f add r30, r24
46: f1 1d adc r31, r1
48: 84 91 lpm r24, Z
4a: 08 95 ret
It should be noted that, not only has the linker moved the pieces around
to their final addresses, it has also fixed the arguments of the ldi
instructions so that they now point to the correct address of my_array.
The code should look something like this:
.section .text
.global main
main:
ldi r30,lo8(data)
ldi r31,hi8(data)
ldd r24,Z+3
sts output,r24
ld r24,Z
sts output,r24
ldi r24,0
ldi r25,0
ret
.global data
.data
data:
.byte 1, 2, 3, 4
.comm output,1,1
Explanation
For people who have programmed in assembler using the GNU toolchain before, there are lessons that are transferable even to unfamiliar instruction sets:
You reserve space for an array with the assembler directives .byte 1, 2, 3, 4, .word 1, 2 (.word is 16 bits for AVR) or .space 100.
When learning a new instruction set, write C programs and ask the C compiler to generate assembler output. Find a good assembler programming reference for the instruction set as you read the assembler code.
Applying this trick below.
byte-array.c
/* volatile our code doesn't get optimized out even when compiler optimization is on */
volatile char output;
char data[] = { 1, 2, 3, 4 };
int main(void)
{
output = data[3];
output = data[0];
return 0;
}
Generate Assembler from C
avr-gcc -mmcu=atmega8 -Wall -Os -S byte-array.c
This will generate the assembler file byte-array.s.
byte-array.s
.file "byte-array.c"
__SP_H__ = 0x3e
__SP_L__ = 0x3d
__SREG__ = 0x3f
__tmp_reg__ = 0
__zero_reg__ = 1
.section .text.startup,"ax",#progbits
.global main
.type main, #function
main:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
ldi r30,lo8(data)
ldi r31,hi8(data)
ldd r24,Z+3
sts output,r24
ld r24,Z
sts output,r24
ldi r24,0
ldi r25,0
ret
.size main, .-main
.global data
.data
.type data, #object
.size data, 4
data:
.byte 1
.byte 2
.byte 3
.byte 4
.comm output,1,1
.ident "GCC: (Fedora 4.9.2-1.fc21) 4.9.2"
.global __do_copy_data
.global __do_clear_bss
Read this explanation of Pointer Registers to see how the AVR instruction set uses the r30, r31 register pair as the pointer register Z. Read up on the ld, st, ldi, ldd, sts and std instructions.
Implementation Notes
If you link the program then disassemble it:
avr-gcc -mmcu=atmega8 -Os byte-array.c -o byte-array.elf
avr-objdump -d byte-array.elf
00000000 <__vectors>:
0: 12 c0 rjmp .+36 ; 0x26 <__ctors_end>
2: 2c c0 rjmp .+88 ; 0x5c <__bad_interrupt>
4: 2b c0 rjmp .+86 ; 0x5c <__bad_interrupt>
6: 2a c0 rjmp .+84 ; 0x5c <__bad_interrupt>
8: 29 c0 rjmp .+82 ; 0x5c <__bad_interrupt>
a: 28 c0 rjmp .+80 ; 0x5c <__bad_interrupt>
c: 27 c0 rjmp .+78 ; 0x5c <__bad_interrupt>
e: 26 c0 rjmp .+76 ; 0x5c <__bad_interrupt>
10: 25 c0 rjmp .+74 ; 0x5c <__bad_interrupt>
12: 24 c0 rjmp .+72 ; 0x5c <__bad_interrupt>
14: 23 c0 rjmp .+70 ; 0x5c <__bad_interrupt>
16: 22 c0 rjmp .+68 ; 0x5c <__bad_interrupt>
18: 21 c0 rjmp .+66 ; 0x5c <__bad_interrupt>
1a: 20 c0 rjmp .+64 ; 0x5c <__bad_interrupt>
1c: 1f c0 rjmp .+62 ; 0x5c <__bad_interrupt>
1e: 1e c0 rjmp .+60 ; 0x5c <__bad_interrupt>
20: 1d c0 rjmp .+58 ; 0x5c <__bad_interrupt>
22: 1c c0 rjmp .+56 ; 0x5c <__bad_interrupt>
24: 1b c0 rjmp .+54 ; 0x5c <__bad_interrupt>
00000026 <__ctors_end>:
26: 11 24 eor r1, r1
28: 1f be out 0x3f, r1 ; 63
2a: cf e5 ldi r28, 0x5F ; 95
2c: d4 e0 ldi r29, 0x04 ; 4
2e: de bf out 0x3e, r29 ; 62
30: cd bf out 0x3d, r28 ; 61
00000032 <__do_copy_data>:
32: 10 e0 ldi r17, 0x00 ; 0
34: a0 e6 ldi r26, 0x60 ; 96
36: b0 e0 ldi r27, 0x00 ; 0
38: e4 e8 ldi r30, 0x84 ; 132
3a: f0 e0 ldi r31, 0x00 ; 0
3c: 02 c0 rjmp .+4 ; 0x42 <__SREG__+0x3>
3e: 05 90 lpm r0, Z+
40: 0d 92 st X+, r0
42: ac 36 cpi r26, 0x6C ; 108
44: b1 07 cpc r27, r17
46: d9 f7 brne .-10 ; 0x3e <__SP_H__>
00000048 <__do_clear_bss>:
48: 10 e0 ldi r17, 0x00 ; 0
4a: ac e6 ldi r26, 0x6C ; 108
4c: b0 e0 ldi r27, 0x00 ; 0
4e: 01 c0 rjmp .+2 ; 0x52 <.do_clear_bss_start>
00000050 <.do_clear_bss_loop>:
50: 1d 92 st X+, r1
00000052 <.do_clear_bss_start>:
52: ad 36 cpi r26, 0x6D ; 109
54: b1 07 cpc r27, r17
56: e1 f7 brne .-8 ; 0x50 <.do_clear_bss_loop>
58: 02 d0 rcall .+4 ; 0x5e <main>
5a: 12 c0 rjmp .+36 ; 0x80 <_exit>
0000005c <__bad_interrupt>:
5c: d1 cf rjmp .-94 ; 0x0 <__vectors>
0000005e <main>: ...
00000080 <_exit>:
80: f8 94 cli
00000082 <__stop_program>:
82: ff cf rjmp .-2 ; 0x82 <__stop_program>
You can see avr-gcc automatically generates startup code, including:
the interrupt vector (__vectors), which uses rjmp to jump to the Interrupt Service Routines.
initialize the status register, SREG , and the stack pointer, SPL/SPH (__ctors_end)
copies the data segment content from FLASH to RAM for initialized, writable global variables (__do_copy_data)
clears the BSS segment for uninitialized writable global variables (__do_clear_bss etc)
calls our main() function
calls _exit() if main() ever returns
_exit() is just a cli to disable interrupts
and an infinite loop (__stop_program)
I am trying to add inline comments to the generated assembly on my arduino. So for instance
/*
Testing
*/
#include <avr/io.h>
#include <iostream>
int ledPin = 13;
void setup()
{
asm volatile("\n# comment 1");
pinMode(ledPin, OUTPUT);
}
void loop()
{
asm volatile("\n# comment 2");
digitalWrite(ledPin, HIGH);
delay(1000);
digitalWrite(ledPin, LOW);
delay(1000);
}
when the assembly is generated for the code i want to see "comment 1" and "comment 2" as a marker at that particular line in the assembly code.
Here is the assembly generated code
C:\Users\****\AppData\Local\Temp\build1888832469367065438.tmp\sketch_jul17a.cpp.o: file format elf32-avr
Disassembly of section .text.loop:
00000000 <loop>:
loop():
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:19
0: 80 91 00 00 lds r24, 0x0000
4: 61 e0 ldi r22, 0x01 ; 1
6: 0e 94 00 00 call 0 ; 0x0 <loop>
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:20
a: 68 ee ldi r22, 0xE8 ; 232
c: 73 e0 ldi r23, 0x03 ; 3
e: 80 e0 ldi r24, 0x00 ; 0
10: 90 e0 ldi r25, 0x00 ; 0
12: 0e 94 00 00 call 0 ; 0x0 <loop>
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:21
16: 80 91 00 00 lds r24, 0x0000
1a: 60 e0 ldi r22, 0x00 ; 0
1c: 0e 94 00 00 call 0 ; 0x0 <loop>
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:22
20: 68 ee ldi r22, 0xE8 ; 232
22: 73 e0 ldi r23, 0x03 ; 3
24: 80 e0 ldi r24, 0x00 ; 0
26: 90 e0 ldi r25, 0x00 ; 0
28: 0e 94 00 00 call 0 ; 0x0 <loop>
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:23
2c: 08 95 ret
Disassembly of section .text.setup:
00000000 <setup>:
setup():
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:13
0: 80 91 00 00 lds r24, 0x0000
4: 61 e0 ldi r22, 0x01 ; 1
6: 0e 94 00 00 call 0 ; 0x0 <setup>
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:14
a: 08 95 ret
The comments are not included in the assembly code, how can I do this
First of all comments in "regular" (not cpp pre-processed) assembler code begin with a hash sign, not with two slashes. So you might want that a comment named "# onesectimer" is in the assembler code.
This may be archieved the following way:
asm("\n# onesectimer");
void OneSecTimer()
{
if(bags!=0){
asm("\n# for counter 1");
if(counter1 == 3)
...
--- Edit ---
Reading your comments and your edits I think you are mixing the words "assembly" and "disassembly":
When translating C code to binary code the C compiler generates "assembly" code. This code may contain comments:
# This is a comment
lds r24, 0
ldi r22, 1
call digitalWrite
The assembler then translates this "assembly" code to binary code. In binary code there is no information about comments any more but only the binary data to be written to the memory.
The "disassembly" translates the binary data back to assembly code but only information that is present in binary code may be disassembled - so you cannot have any comments in disassembly code!
What you may do is to insert a symbol into the object file at a point of interest:
digitalWrite(0,1);
asm volatile(".global Here_is_Delay\nHere_is_Delay:");
delay(1000);
The names of these symbols must be unique over the whole project and not be identical to any function or variable name used.
Depending on the disassembler (not sure about the AVR one) you'll see the symbols then:
loop():
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:19
0: 80 91 00 00 lds r24, 0x0000
4: 61 e0 ldi r22, 0x01 ; 1
6: 0e 94 00 00 call 0 ; 0x0 <loop>
Here_is_Delay():
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:20
a: 68 ee ldi r22, 0xE8 ; 232
c: 73 e0 ldi r23, 0x03 ; 3
...