Bitwise shift of an array of chars in AVR C

Bitwise shift of an array of chars in AVR C - c

Initially I need to send and receive serially some data. The packet length is 48 bits.
For shorter packets (32 bits) I could do something like that:
unsigned long data=0x12345678;
for(i=0;i<32;i++){
if(data & 0x80000000)
setb_MOD;
else
clrb_MOD;
data <<= 1;
}
This code compilation is really pleasing me:
code<<=1;
ac: 88 0f add r24, r24
ae: 99 1f adc r25, r25
b0: aa 1f adc r26, r26
b2: bb 1f adc r27, r27
b4: 80 93 63 00 sts 0x0063, r24
b8: 90 93 64 00 sts 0x0064, r25
bc: a0 93 65 00 sts 0x0065, r26
c0: b0 93 66 00 sts 0x0066, r27
After I needed to extend the packet (to 48 bits) I faced with the need to shift an array:
unsigned char data[6]={0x12,0x34,0x56,0x78,0xAB,0xCD};
for(i=0;i<48;i++){
if(data[5] & 0x80)
setb_MOD;
else
clrb_MOD;
for(j=5;j>0;j--){
data[j]<<=1;
if(data[j-1] & 0x80)
data[j]+=1;
}
data[0] <<= 1;
}
The compilled code is slightly depends on an optimization settings but generally it is doing what I commanded in C:
for(j=5;j>0;j--){
code[j]<<=1;
a8: 82 91 ld r24, -Z
aa: 88 0f add r24, r24
ac: 80 83 st Z, r24
if(code[j-1]&0x80)
ae: 9e 91 ld r25, -X
b0: 97 fd sbrc r25, 7
b2: 13 c0 rjmp .+38 ; 0xda <__vector_2+0x74>
clrb_MOD;
}
else{
setb_MOD;
}
for(j=5;j>0;j--){
b4: 80 e0 ldi r24, 0x00 ; 0
b6: a3 36 cpi r26, 0x63 ; 99
b8: b8 07 cpc r27, r24
ba: b1 f7 brne .-20 ; 0xa8 <__vector_2+0x42>
code[j]<<=1;
if(code[j-1]&0x80)
code[j]+=1;
}
As you can see there is no obvious (for a human) solution to shift an array byte after byte.
I'd like to skip injection of inline assembler as I don't really manage this technique and I don't really understand how do I address C variables in Asm. Is there any alternatives?

If you know your input is less than 64 bits, you can do something like (assuming stdint.h is available, otherwise convert to unsigned long long, etc):
union BitShifter
{
uint64_t u64;
uint32_t u32[2];
uint16_t u16[4];
uint8_t u8[8];
};
union BitShifter MyBitshifter;
MyBitShifter.u64 <<= 1;
The compiler should use the best instruction to accomplish that (probably two 32 bit shifts and some other logic to get the bit from one word to another. Of course, the backend might be lazy and do it as bytes...
Depending on the endianness of the AVR, you'll have to swizzle your bytes in the right order to the outgoing bit order correct.

Related

How could i define a function at run-time in c

I am trying to define and call a function at run-time with c language in arm cpu(cortex a72). in order to do that, i implemented a code like below:
#include <stdio.h>
#include <sys/mman.h>
char* ibuf;
int pbuf = 0;
#define ADD_BYTE(val) do{ibuf[pbuf] = val; pbuf++;} while(0)
void (*routine)(void);
void MakeRoutineSimpleFunc(void)
{
//nop
ADD_BYTE(0x00);
ADD_BYTE(0xf0);
ADD_BYTE(0x20);
ADD_BYTE(0xe3);
//bx lr
ADD_BYTE(0x1e);
ADD_BYTE(0xff);
ADD_BYTE(0x2f);
ADD_BYTE(0xe1);
}
int main(void)
{
ibuf = (char*)mmap(NULL, 4 * 1024, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_POPULATE | MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
MakeRoutineSimpleFunc();
routine = (void(*)())(ibuf);
routine();
}
as you can see, in above code at first i allocate an executable memory region and assign address of that to ibuf, then i put some simple instruction in ibuf (a "nop" and "bx lr" that means return in arm) and then i try to call this function through a function pointer.
but when i want to call function through function pointer i got an "segmentation fault" error. BTW when i try to run the app with GDB debugger program run successfully without any error.
there is anything that i missed in above code that cause "segmentation fault"?
i want to add, when i add above instructions (a "nop" and "bx lr" that means return in arm) at compile-time to a function like below, function work without any error.
void f2(void)
{
__asm__ volatile (".byte 0x00, 0xf0, 0x20, 0xe3");
__asm__ volatile (".byte 0x1e, 0xff, 0x2f, 0xe1");
}
EDIT1:
in order to check validity of run-time function, i have removed f2 prolog and epilogue with ghidra disassembler, so assembly code of f2 is like this:
**************************************************************
FUNCTION
**************************************************************
undefined FUN_0000083c()
undefined r0:1 <RETURN>
undefined4 Stack[-0x4]:4 local_4
FUN_0000083c XREF[1]: FUN_00000868:000008a4(c)
0000083c 00 f0 20 e3 nop
00000840 00 f0 20 e3 nop
00000844 00 f0 20 e3 nop
00000848 00 f0 20 e3 nop
0000084c 00 f0 20 e3 nop
00000850 00 f0 20 e3 nop
00000854 1e ff 2f e1 bx lr
00000858 00 f0 20 e3 nop
0000085c 00 f0 20 e3 nop
00000860 00 f0 20 e3 nop
00000864 00 f0 20 e3 nop
and also it work again without problem.
EDIT2:
something that i want to add that may be helpful to solve the problem, as i saw in the assembler, compiler call "routine" function with "blx r3" instruction while it call other functions with "bl 'symbol name'". as i know blx can change processor state from ARM to Thumb or vise versa. could this point cause the problem?
EDIT3:
disassemble of main function is something like below:
**************************************************************
FUNCTION
**************************************************************
int __stdcall main(void)
int r0:4 <RETURN>
undefined4 Stack[-0xc]:4 local_c XREF[1]: 00010d44(W)
undefined4 Stack[-0x10]:4 local_10 XREF[1]: 00010d4c(W)
main XREF[4]: Entry Point(*),
_start:00010394(*), 000103a8(*),
.debug_frame::000000a0(*)
00010d34 00 48 2d e9 stmdb sp!,{ r11 lr }
00010d38 04 b0 8d e2 add r11,sp,#0x4
00010d3c 08 d0 4d e2 sub sp,sp,#0x8
00010d40 00 30 a0 e3 mov r3,#0x0
00010d44 04 30 8d e5 str r3,[sp,#local_c]
00010d48 00 30 e0 e3 mvn r3,#0x0
00010d4c 00 30 8d e5 str r3,[sp,#0x0]=>local_10
00010d50 22 30 a0 e3 mov r3,#0x22
00010d54 07 20 a0 e3 mov r2,#0x7
00010d58 01 1a a0 e3 mov r1,#0x1000
00010d5c 00 00 a0 e3 mov r0,#0x0
00010d60 7d fd ff eb bl mmap
00010d64 00 20 a0 e1 cpy r2,r0
00010d68 50 30 9f e5 ldr r3,[->ibuf]
00010d6c 00 20 83 e5 str r2,[r3,#0x0]=>ibuf
00010d70 48 30 9f e5 ldr r3,[->ibuf]
00010d74 00 30 93 e5 ldr r3,[r3,#0x0]=>ibuf
00010d78 03 10 a0 e1 cpy r1,r3
00010d7c 40 00 9f e5 ldr r0=>s_ibuf:_%x_00010e40,[PTR_s_ibuf:_%x_00010d
00010d80 69 fd ff eb bl printf
00010d84 ae fe ff eb bl MakeRoutineSimpleFunc
00010d88 30 30 9f e5 ldr r3,[->ibuf]
00010d8c 00 30 93 e5 ldr r3,[r3,#0x0]=>ibuf
00010d90 03 20 a0 e1 cpy r2,r3
00010d94 2c 30 9f e5 ldr r3,[->routine]
00010d98 00 20 83 e5 str r2,[r3,#0x0]=>routine
00010d9c 24 30 9f e5 ldr r3,[->routine]
00010da0 00 30 93 e5 ldr r3,[r3,#0x0]=>routine
00010da4 33 ff 2f e1 blx r3
00010da8 1c 00 9f e5 ldr r0=>DAT_00010e4c,
00010dac 61 fd ff eb bl puts
00010db0 00 30 a0 e3 mov r3,#0x0
00010db4 03 00 a0 e1 cpy r0,r3
00010db8 04 d0 4b e2 sub sp,r11,#0x4
00010dbc 00 88 bd e8 ldmia sp!,{ r11 pc }
as you can see, routine called with "blx r3" instruction at address "00010da4". and also i printed address of ibuf, it is was "0xb6ff8000".

I think, you can enter the opcodes directly in a string "binary-code" and execute the code using ((void*)STRING)(). However, you may want to read also about how gcc implements trampolines, because this is how gcc generates code that creates code on the stack and jumps the execution there.

_delay_us(); timing explanation

I made a program with atmega16 and was trying to make my own delay_us() so I looked into the compiler avr-gcc library for _delay_us() function and this is its code:
static inline void _delay_us(double __us) __attribute__((always_inline));
/*
\ingroup util_delay
Perform a delay of \c __us microseconds, using _delay_loop_1().
The macro F_CPU is supposed to be defined to a
constant defining the CPU clock frequency (in Hertz).
The maximal possible delay is 768 us / F_CPU in MHz.
If the user requests a delay greater than the maximal possible one,
_delay_us() will automatically call _delay_ms() instead. The user
will not be informed about this case.
*/
void
_delay_us(double __us)
{
uint8_t __ticks;
double __tmp = ((F_CPU) / 3e6) * __us; //number of ticks per us * delay time in us
if (__tmp < 1.0)
__ticks = 1;
else if (__tmp > 255)
{
_delay_ms(__us / 1000.0);
return;
}
else
__ticks = (uint8_t)__tmp;
_delay_loop_1(__ticks); // function decrements ticks untill it reaches 0( takes 3 cycles)
}
I got confused about how, if I use a 1Mhz clock, this function that contains floating point arithmetic will be able to make small delays (like _delay_us(10)), because executing all the the setup code will definitely take more time than that. So I wrote this program:
#include <avr/io.h>
#include <avr/interrupt.h>
#include <util/delay.h>
#define F_CPU 1000000UL
int main()
{
_delay_ms(1000);
DDRB=0XFF;
PORTB=0XFF;
_delay_us(10);
PORTB=0;
for(;;){}
return 0;
}
I simulated it using protues and used an oscilloscope and connected one of PORTB pins to its input. Then I saw that the delay was exactly 10 us. How could the delay be that accurate considering this set up code statement that uses floating point arithmetic:
double __tmp = ((F_CPU) / 4e3) * __ms;
This should have taken a lot of cycles that makes _delay_us(10) exceed the 10 us period, but the time was exactly 10us!!

All the floating point arithmetic is calculated by the preprocessor as these delay functions are actually macros. So at the point the mcu executes the code, all that's left is a loop that uses an integer to do the delay.

typedef unsigned char uint8_t;
#define F_CPU 16000000
extern void _delay_loop_1( uint8_t );
static void _delay_us(double __us)
{
uint8_t __ticks;
double __tmp = ((F_CPU) / 3e6) * __us;
if (__tmp < 1.0)
{
__ticks = 1;
}
else
{
if (__tmp > 255)
{
_delay_ms(__us / 1000.0);
return;
}
else
{
__ticks = (uint8_t)__tmp;
}
}
_delay_loop_1(__ticks);
}
void fun1 ( void )
{
_delay_us(10);
}
which with gcc can produce this:
00000000 <fun1>:
0: 85 e3 ldi r24, 0x35 ; 53
2: 00 c0 rjmp .+0 ; 0x4 <__zero_reg__+0x3>
the number to feed _delay_loop_1 is computed at compile time not runtime, all of that dead code goes away.
but add this:
void fun2 ( void )
{
uint8_t ra;
for(ra=1;ra<10;ra++) _delay_us(ra);
}
and things change dramatically.
00000000 <fun1>:
0: 85 e3 ldi r24, 0x35 ; 53
2: 00 c0 rjmp .+0 ; 0x4 <fun2>
00000004 <fun2>:
4: 8f 92 push r8
6: 9f 92 push r9
8: af 92 push r10
a: bf 92 push r11
c: cf 92 push r12
e: df 92 push r13
10: ef 92 push r14
12: ff 92 push r15
14: cf 93 push r28
16: c1 e0 ldi r28, 0x01 ; 1
18: 6c 2f mov r22, r28
1a: 70 e0 ldi r23, 0x00 ; 0
1c: 80 e0 ldi r24, 0x00 ; 0
1e: 90 e0 ldi r25, 0x00 ; 0
20: 00 d0 rcall .+0 ; 0x22 <fun2+0x1e>
22: 86 2e mov r8, r22
24: 97 2e mov r9, r23
26: a8 2e mov r10, r24
28: b9 2e mov r11, r25
2a: 2b ea ldi r18, 0xAB ; 171
2c: 3a ea ldi r19, 0xAA ; 170
2e: 4a ea ldi r20, 0xAA ; 170
30: 50 e4 ldi r21, 0x40 ; 64
32: 00 d0 rcall .+0 ; 0x34 <fun2+0x30>
34: c6 2e mov r12, r22
36: d7 2e mov r13, r23
38: e8 2e mov r14, r24
3a: f9 2e mov r15, r25
3c: 20 e0 ldi r18, 0x00 ; 0
3e: 30 e0 ldi r19, 0x00 ; 0
40: 40 e8 ldi r20, 0x80 ; 128
42: 5f e3 ldi r21, 0x3F ; 63
44: 00 d0 rcall .+0 ; 0x46 <fun2+0x42>
46: 87 fd sbrc r24, 7
48: 00 c0 rjmp .+0 ; 0x4a <fun2+0x46>
4a: 20 e0 ldi r18, 0x00 ; 0
4c: 30 e0 ldi r19, 0x00 ; 0
4e: 4f e7 ldi r20, 0x7F ; 127
50: 53 e4 ldi r21, 0x43 ; 67
52: 9f 2d mov r25, r15
54: 8e 2d mov r24, r14
56: 7d 2d mov r23, r13
58: 6c 2d mov r22, r12
5a: 00 d0 rcall .+0 ; 0x5c <fun2+0x58>
5c: 18 16 cp r1, r24
5e: 04 f0 brlt .+0 ; 0x60 <fun2+0x5c>
60: 9f 2d mov r25, r15
62: 8e 2d mov r24, r14
64: 7d 2d mov r23, r13
66: 6c 2d mov r22, r12
68: 00 d0 rcall .+0 ; 0x6a <fun2+0x66>
6a: 86 2f mov r24, r22
6c: 00 d0 rcall .+0 ; 0x6e <fun2+0x6a>
6e: cf 5f subi r28, 0xFF ; 255
70: ca 30 cpi r28, 0x0A ; 10
72: 01 f4 brne .+0 ; 0x74 <fun2+0x70>
74: cf 91 pop r28
76: ff 90 pop r15
78: ef 90 pop r14
7a: df 90 pop r13
7c: cf 90 pop r12
7e: bf 90 pop r11
80: af 90 pop r10
82: 9f 90 pop r9
84: 8f 90 pop r8
86: 08 95 ret
88: 20 e0 ldi r18, 0x00 ; 0
8a: 30 e0 ldi r19, 0x00 ; 0
8c: 4a e7 ldi r20, 0x7A ; 122
8e: 54 e4 ldi r21, 0x44 ; 68
90: 9b 2d mov r25, r11
92: 8a 2d mov r24, r10
94: 79 2d mov r23, r9
96: 68 2d mov r22, r8
98: 00 d0 rcall .+0 ; 0x9a <fun2+0x96>
9a: 00 d0 rcall .+0 ; 0x9c <fun2+0x98>
9c: 00 c0 rjmp .+0 ; 0x9e <fun2+0x9a>
9e: 81 e0 ldi r24, 0x01 ; 1
a0: 00 c0 rjmp .+0 ; 0xa2 <__SREG__+0x63>
hmm, how good is the optimizer?
void fun3 ( void )
{
uint8_t ra;
for(ra=20;ra<22;ra++) _delay_us(ra);
}
thought so
00000004 <fun3>:
4: 8a e6 ldi r24, 0x6A ; 106
6: 00 d0 rcall .+0 ; 0x8 <fun3+0x4>
8: 80 e7 ldi r24, 0x70 ; 112
a: 00 c0 rjmp .+0 ; 0xc <fun3+0x8>
figured the count to 10 would have done it.
A lot of times you will see delay loop functions like this get used with hardcoded values because the spec for whatever you are bit banging, etc has those values and it is easy when the thing says pop reset then wait 100us, you just call a delay with 100 in it. Now if there is one file with:
fun4(10);
and another file (another optimization domain) you have the above with this added:
void fun4 ( uint8_t x)
{
_delay_us(x);
}
you can then understand where this is headed...runtime...dont even need to compile it to see that it will have the problem. Now some compilers like llvm you can optimize across file domains, but they dont target the AVR, their MSP430 was a publicity stunt more than reality as it doesnt work and is not supported. Their arm support is obviously good, but they change their command line options pretty much every minor release, I have long since gotten tired trying to use them as I have to constantly change my makefiles to keep up, and their optimized code is sadly as not as fast as gccs despite gccs code getting worse every release and llvm getting a little better (worse/better are in the eyes of the beholder of course).

Creation and addressing arrays in AVR Assembly (Using the ATMega8535)

I am having trouble with the creation and addressing of an array created purely in assembly using the instruction set for the Atmel ATMega8535.
What I understand so far is as follows:
The array contains contiguous data that is equal in length.
The creation of the array involves defining the beginning and end locations of the array (much like you would the stack).
You would address an index in the array by adding an offset of the base address of the array.
What I am looking to do specifically is create a 1-D array of 8-bit integers with predefined values populating it during initialization it does not have to be written to, only addressed when needed. The problem ultimately lying in not being able to translate the logic into the assembly code.
I have tried with little progress to do so using support from the following books:
Some Assembly Required: Assembly Language Programming with the AVR Microcontroller by Timothy S Margush
Get Going with...AVR Microcontrollers by Peter Sharpe
Any help, advice or further resources would be greatly appreciated.

If your array is read-only, you do not need to copy it to RAM. You can
keep it in Flash and read it from there when needed. This will save you
precious RAM, at the cost of slower access (read from RAM is 2 cycles,
read from flash is 3 cycles).
You can declare your array like this:
.global my_array
.type my_array, #object
my_array:
.byte 12, 34, 56, 78
Then, to read a member of the array, you have to compute:
adress of member = array base address + member index
If your members were more than one byte, you would have to also multiply
the index by the size, but this is not the case here. Then, you put the
address of the required member in the Z register and issue an lpm
instruction. Here is a function implementing this logic:
.global read_data
; input: r24 = array index, r1 = 0
; output: r24 = array value
; clobbers: r30, r31
read_data:
ldi r30, lo8(my_array) ; load Z = address of my_array
ldi r31, hi8(my_array) ; ...high byte also
add r30, r24 ; add the array index
adc r31, r1 ; ...and add 0 to propagate the carry
lpm r24, Z
ret
#scottt advised you to first write in C, then look at the generated
assembly. I consider this very good advice, let's follow it:
#include <stdint.h>
__flash const uint8_t my_array[] = {12, 34, 56, 78};
uint8_t read_data(uint8_t index)
{
return my_array[index];
}
The __flash keyword identifying a “named address space” is an embedded
C extension supported by
gcc. The
generated assembly is slightly different from the previous one: instead
of computing base_address + index, gcc does index − (−base_address):
read_data:
mov r30, r24 ; load Z = array index
ldi r31, 0 ; ...high byte of index is 0
subi r30, lo8(-(my_array)) ; subtract -(address of my array)
sbci r31, hi8(-(my_array)) ; ...high byte also
lpm r24, Z
ret
This is just as efficient as the previous hand-rolled assembly, except
that it does not need the r1 register to be initialized to zero. But
keeping r1 to zero is part of the gcc ABI anyway, so it should make no
difference.
The role of the linker
This section is meant to answer the question in the comment: how can we
access the array if we do not know its address? The answer is: we access
it by its name, just like in the code snippets above. Choosing the final
address for the array, as well as replacing the name by the appropriate
address, is the linker’s job.
Assembling (with avr-gcc -c) and disassembling (with avr-objdump -d)
the first code snippet gives this:
my_array.o, section .text:
00000000 <my_array>:
0: 0c 22 38 4e ."8N
If we were compiling from C, gcc would have put the array in the
.progmem.data section instead of .text, but it makes little difference.
The numbers “0c 22 38 4e” are the array contents, in hex. The characters
to the right are the ASCII equivalents, ‘.’ being the placeholder for
non printing characters.
The object file also carries this symbol table, shown by avr-nm:
my_array.o:
00000000 T my_array
meaning the symbol “my_array” has been defined as referring to offset 0
into the .text section (implied by “T”) of this object.
Assembling and disassembling the second code snippet gives this:
read_data.o, section .text:
00000000 <read_data>:
0: e0 e0 ldi r30, 0x00
2: f0 e0 ldi r31, 0x00
4: e8 0f add r30, r24
6: f1 1d adc r31, r1
8: 84 91 lpm r24, Z
a: 08 95 ret
Comparing the disassembly with the actual source code, it can be seen
that the assembler replaced the address of my_array with 0x00, which is
almost guaranteed to be wrong. But it also left a note to the linker in
the form of “relocation records”, shown by avr-objdump -r:
read_data.o, RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
00000000 R_AVR_LO8_LDI my_array
00000002 R_AVR_HI8_LDI my_array
This tells the linker that the ldi instructions at offsets 0x00 and
0x02 are intended to load the low byte and the high byte (respectively)
of the final address of my_array. The object file also carries this
symbol table:
read_data.o:
U my_array
00000000 T read_data
where the “U” line means the file makes use of an undefined symbol named
“my_array”.
Linking these pieces together, with a suitable main(), yields a binary
containing the C runtime from avr-lbc, together with our code:
0000003c <my_array>:
3c: 0c 22 38 4e ."8N
00000040 <read_data>:
40: ec e3 ldi r30, 0x3C
42: f0 e0 ldi r31, 0x00
44: e8 0f add r30, r24
46: f1 1d adc r31, r1
48: 84 91 lpm r24, Z
4a: 08 95 ret
It should be noted that, not only has the linker moved the pieces around
to their final addresses, it has also fixed the arguments of the ldi
instructions so that they now point to the correct address of my_array.

The code should look something like this:
.section .text
.global main
main:
ldi r30,lo8(data)
ldi r31,hi8(data)
ldd r24,Z+3
sts output,r24
ld r24,Z
sts output,r24
ldi r24,0
ldi r25,0
ret
.global data
.data
data:
.byte 1, 2, 3, 4
.comm output,1,1
Explanation
For people who have programmed in assembler using the GNU toolchain before, there are lessons that are transferable even to unfamiliar instruction sets:
You reserve space for an array with the assembler directives .byte 1, 2, 3, 4, .word 1, 2 (.word is 16 bits for AVR) or .space 100.
When learning a new instruction set, write C programs and ask the C compiler to generate assembler output. Find a good assembler programming reference for the instruction set as you read the assembler code.
Applying this trick below.
byte-array.c
/* volatile our code doesn't get optimized out even when compiler optimization is on */
volatile char output;
char data[] = { 1, 2, 3, 4 };
int main(void)
{
output = data[3];
output = data[0];
return 0;
}
Generate Assembler from C
avr-gcc -mmcu=atmega8 -Wall -Os -S byte-array.c
This will generate the assembler file byte-array.s.
byte-array.s
.file "byte-array.c"
__SP_H__ = 0x3e
__SP_L__ = 0x3d
__SREG__ = 0x3f
__tmp_reg__ = 0
__zero_reg__ = 1
.section .text.startup,"ax",#progbits
.global main
.type main, #function
main:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
ldi r30,lo8(data)
ldi r31,hi8(data)
ldd r24,Z+3
sts output,r24
ld r24,Z
sts output,r24
ldi r24,0
ldi r25,0
ret
.size main, .-main
.global data
.data
.type data, #object
.size data, 4
data:
.byte 1
.byte 2
.byte 3
.byte 4
.comm output,1,1
.ident "GCC: (Fedora 4.9.2-1.fc21) 4.9.2"
.global __do_copy_data
.global __do_clear_bss
Read this explanation of Pointer Registers to see how the AVR instruction set uses the r30, r31 register pair as the pointer register Z. Read up on the ld, st, ldi, ldd, sts and std instructions.
Implementation Notes
If you link the program then disassemble it:
avr-gcc -mmcu=atmega8 -Os byte-array.c -o byte-array.elf
avr-objdump -d byte-array.elf
00000000 <__vectors>:
0: 12 c0 rjmp .+36 ; 0x26 <__ctors_end>
2: 2c c0 rjmp .+88 ; 0x5c <__bad_interrupt>
4: 2b c0 rjmp .+86 ; 0x5c <__bad_interrupt>
6: 2a c0 rjmp .+84 ; 0x5c <__bad_interrupt>
8: 29 c0 rjmp .+82 ; 0x5c <__bad_interrupt>
a: 28 c0 rjmp .+80 ; 0x5c <__bad_interrupt>
c: 27 c0 rjmp .+78 ; 0x5c <__bad_interrupt>
e: 26 c0 rjmp .+76 ; 0x5c <__bad_interrupt>
10: 25 c0 rjmp .+74 ; 0x5c <__bad_interrupt>
12: 24 c0 rjmp .+72 ; 0x5c <__bad_interrupt>
14: 23 c0 rjmp .+70 ; 0x5c <__bad_interrupt>
16: 22 c0 rjmp .+68 ; 0x5c <__bad_interrupt>
18: 21 c0 rjmp .+66 ; 0x5c <__bad_interrupt>
1a: 20 c0 rjmp .+64 ; 0x5c <__bad_interrupt>
1c: 1f c0 rjmp .+62 ; 0x5c <__bad_interrupt>
1e: 1e c0 rjmp .+60 ; 0x5c <__bad_interrupt>
20: 1d c0 rjmp .+58 ; 0x5c <__bad_interrupt>
22: 1c c0 rjmp .+56 ; 0x5c <__bad_interrupt>
24: 1b c0 rjmp .+54 ; 0x5c <__bad_interrupt>
00000026 <__ctors_end>:
26: 11 24 eor r1, r1
28: 1f be out 0x3f, r1 ; 63
2a: cf e5 ldi r28, 0x5F ; 95
2c: d4 e0 ldi r29, 0x04 ; 4
2e: de bf out 0x3e, r29 ; 62
30: cd bf out 0x3d, r28 ; 61
00000032 <__do_copy_data>:
32: 10 e0 ldi r17, 0x00 ; 0
34: a0 e6 ldi r26, 0x60 ; 96
36: b0 e0 ldi r27, 0x00 ; 0
38: e4 e8 ldi r30, 0x84 ; 132
3a: f0 e0 ldi r31, 0x00 ; 0
3c: 02 c0 rjmp .+4 ; 0x42 <__SREG__+0x3>
3e: 05 90 lpm r0, Z+
40: 0d 92 st X+, r0
42: ac 36 cpi r26, 0x6C ; 108
44: b1 07 cpc r27, r17
46: d9 f7 brne .-10 ; 0x3e <__SP_H__>
00000048 <__do_clear_bss>:
48: 10 e0 ldi r17, 0x00 ; 0
4a: ac e6 ldi r26, 0x6C ; 108
4c: b0 e0 ldi r27, 0x00 ; 0
4e: 01 c0 rjmp .+2 ; 0x52 <.do_clear_bss_start>
00000050 <.do_clear_bss_loop>:
50: 1d 92 st X+, r1
00000052 <.do_clear_bss_start>:
52: ad 36 cpi r26, 0x6D ; 109
54: b1 07 cpc r27, r17
56: e1 f7 brne .-8 ; 0x50 <.do_clear_bss_loop>
58: 02 d0 rcall .+4 ; 0x5e <main>
5a: 12 c0 rjmp .+36 ; 0x80 <_exit>
0000005c <__bad_interrupt>:
5c: d1 cf rjmp .-94 ; 0x0 <__vectors>
0000005e <main>: ...
00000080 <_exit>:
80: f8 94 cli
00000082 <__stop_program>:
82: ff cf rjmp .-2 ; 0x82 <__stop_program>
You can see avr-gcc automatically generates startup code, including:
the interrupt vector (__vectors), which uses rjmp to jump to the Interrupt Service Routines.
initialize the status register, SREG , and the stack pointer, SPL/SPH (__ctors_end)
copies the data segment content from FLASH to RAM for initialized, writable global variables (__do_copy_data)
clears the BSS segment for uninitialized writable global variables (__do_clear_bss etc)
calls our main() function
calls _exit() if main() ever returns
_exit() is just a cli to disable interrupts
and an infinite loop (__stop_program)

How to add inline comments to generated assembly from arduino

I am trying to add inline comments to the generated assembly on my arduino. So for instance
/*
Testing
*/
#include <avr/io.h>
#include <iostream>
int ledPin = 13;
void setup()
{
asm volatile("\n# comment 1");
pinMode(ledPin, OUTPUT);
}
void loop()
{
asm volatile("\n# comment 2");
digitalWrite(ledPin, HIGH);
delay(1000);
digitalWrite(ledPin, LOW);
delay(1000);
}
when the assembly is generated for the code i want to see "comment 1" and "comment 2" as a marker at that particular line in the assembly code.
Here is the assembly generated code
C:\Users\****\AppData\Local\Temp\build1888832469367065438.tmp\sketch_jul17a.cpp.o: file format elf32-avr
Disassembly of section .text.loop:
00000000 <loop>:
loop():
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:19
0: 80 91 00 00 lds r24, 0x0000
4: 61 e0 ldi r22, 0x01 ; 1
6: 0e 94 00 00 call 0 ; 0x0 <loop>
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:20
a: 68 ee ldi r22, 0xE8 ; 232
c: 73 e0 ldi r23, 0x03 ; 3
e: 80 e0 ldi r24, 0x00 ; 0
10: 90 e0 ldi r25, 0x00 ; 0
12: 0e 94 00 00 call 0 ; 0x0 <loop>
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:21
16: 80 91 00 00 lds r24, 0x0000
1a: 60 e0 ldi r22, 0x00 ; 0
1c: 0e 94 00 00 call 0 ; 0x0 <loop>
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:22
20: 68 ee ldi r22, 0xE8 ; 232
22: 73 e0 ldi r23, 0x03 ; 3
24: 80 e0 ldi r24, 0x00 ; 0
26: 90 e0 ldi r25, 0x00 ; 0
28: 0e 94 00 00 call 0 ; 0x0 <loop>
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:23
2c: 08 95 ret
Disassembly of section .text.setup:
00000000 <setup>:
setup():
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:13
0: 80 91 00 00 lds r24, 0x0000
4: 61 e0 ldi r22, 0x01 ; 1
6: 0e 94 00 00 call 0 ; 0x0 <setup>
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:14
a: 08 95 ret
The comments are not included in the assembly code, how can I do this

First of all comments in "regular" (not cpp pre-processed) assembler code begin with a hash sign, not with two slashes. So you might want that a comment named "# onesectimer" is in the assembler code.
This may be archieved the following way:
asm("\n# onesectimer");
void OneSecTimer()
{
if(bags!=0){
asm("\n# for counter 1");
if(counter1 == 3)
...
--- Edit ---
Reading your comments and your edits I think you are mixing the words "assembly" and "disassembly":
When translating C code to binary code the C compiler generates "assembly" code. This code may contain comments:
# This is a comment
lds r24, 0
ldi r22, 1
call digitalWrite
The assembler then translates this "assembly" code to binary code. In binary code there is no information about comments any more but only the binary data to be written to the memory.
The "disassembly" translates the binary data back to assembly code but only information that is present in binary code may be disassembled - so you cannot have any comments in disassembly code!
What you may do is to insert a symbol into the object file at a point of interest:
digitalWrite(0,1);
asm volatile(".global Here_is_Delay\nHere_is_Delay:");
delay(1000);
The names of these symbols must be unique over the whole project and not be identical to any function or variable name used.
Depending on the disassembler (not sure about the AVR one) you'll see the symbols then:
loop():
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:19
0: 80 91 00 00 lds r24, 0x0000
4: 61 e0 ldi r22, 0x01 ; 1
6: 0e 94 00 00 call 0 ; 0x0 <loop>
Here_is_Delay():
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:20
a: 68 ee ldi r22, 0xE8 ; 232
c: 73 e0 ldi r23, 0x03 ; 3
...

x86 ADC carry flag and length

I'm just doing some analysis of a disassembled 32-bit program I wrote in C. Here is a portion of the output from the disassembler:
41153c 02 00 add al, [eax]
41153e 00 00 add [eax], al
411540 44 inc esp
411541 15 41 00 F8 FF adc eax, 0xfff80041 ; "A"
411546 FF invalid
I'm just trying to make sense of the ADC instruction. From what I've read in both the Intel developers manual, and various articles on x86 ASM the opcode 0x15 is the ADC instruction using EAX as the destination, and it would appear that following the opcode is a four byte 'immediate' that indicates the memory address for use in the add with carry.
However I'm a little unsure as to why the following byte (0xFF) is being marked as invalid.
I'm quite new to assembler, but I'm assuming that this is something to do with the carry flag, and might possibly be there to sign extend the value.
I've used two separate disassemblers to look at the code, and whilst one marks it as invalid, the other simply ignores it.
If someone could offer some advice, I'd appreciate it.
Thanks
UPDATE
I'll add a bit more information to this post, as there are another two ADC operations, and one of them doesn't have the extra 'invalid' byte
411547 FF 04 00 inc [eax+eax]
41154a 00 00 add [eax], al
41154c 61 popa
41154d 15 41 00 EC FF adc eax, 0xffec0041 ; "A"
411552 FF invalid
411553 FF 04 00 inc [eax+eax]
411556 00 00 add [eax], al
411558 5C pop esp
411559 15 41 00 74 65 adc eax, 0x65740041 ; "A"
41155e 73 74 jnc 0x4115d4 ↓
The second ADC that's taking place also has the extra 0xff 'invalid' byte, however the third does not.
From what I can see, the only difference between all three ADC operations is that the first two start with 0xff and have an extra 'invalid' byte, whilst the third does not. I'm assuming that this is forming some kind of flag to indicate if the extra byte is needed.

This is not code. You're trying to disassemble data.
If we rearrange the bytes aligned to 4, we can make some sense of it:
411548: 04 00 00 00 ; integer 4
41154C: 61 15 41 00 ; address 0x411561
411550: EC FF FF FF ; integer 0xFFFFFFEC (-20)
411554: 04 00 00 00 ; integer 4
411558: 5C 15 41 00 ; address 0x41155C
41155C: 74 65 73 74 ; string 'test'

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight