Why this function does point to itself with a offset of 1? - c

I'm trying to write a bare metal blink program for a Nucleo-64 Stm32F401re board using C.
However while starting debugging for errors (it didn't blink yet) I found an odd adress for which I found no explanation. This is the output of the relevant part of the disassembly:
blink.elf: file format elf32-littlearm
Disassembly of section .text:
08000000 <isr_vector_table>:
8000000: 20018000 andcs r8, r1, r0
8000004: 08000009 stmdaeq r0, {r0, r3}
08000008 <Reset_Handler>:
8000008: b480 push {r7}
800000a: af00 add r7, sp, #0
800000c: bf00 nop
800000e: 46bd mov sp, r7
8000010: bc80 pop {r7}
8000012: 4770 bx lr
Disassembly of section .ARM.attributes:
00000000 <.ARM.attributes>:
0: 00002d41 andeq r2, r0, r1, asr #26
4: 61656100 cmnvs r5, r0, lsl #2
8: 01006962 tsteq r0, r2, ror #18
c: 00000023 andeq r0, r0, r3, lsr #32
10: 2d453705 stclcs 7, cr3, [r5, #-20] ; 0xffffffec
14: 0d06004d stceq 0, cr0, [r6, #-308] ; 0xfffffecc
18: 02094d07 andeq r4, r9, #448 ; 0x1c0
1c: 01140412 tsteq r4, r2, lsl r4
20: 03170115 tsteq r7, #1073741829 ; 0x40000005
24: 01190118 tsteq r9, r8, lsl r1
28: 061e011a ; <UNDEFINED> instruction: 0x061e011a
2c: Address 0x0000002c is out of bounds.
The Reset_Handler function itself is on the right adress but by using its name as pointer in the code it points one adress further! Here is the corresponding code:
extern int _stack_top; // bigger Memory Adress
void Reset_Handler (void);
__attribute__((section(".isr_vector"))) int* isr_vector_table[] = {
void Reset_Handler (void) {
And the Linker script I used which is basically the same used in most tutorials.
OUTPUT_FORMAT("elf32-littlearm", "elf32-bigarm", "elf32-littlearm")
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 512K
SRAM (rwx) : ORIGIN = 0x20000000, LENGTH = 96K
.text :
. = ALIGN(4);
. = ALIGN(4);
_etext = .;
.rodata :
. = ALIGN(4);
. = ALIGN(4);
.ARM.extab :
*(.ARM.extab* .gnu.linkonce.armextab.*)
.ARM :
__exidx_start = .;
__exidx_end = .;
.preinit_array :
PROVIDE_HIDDEN (__preinit_array_start = .);
KEEP (*(.preinit_array*))
PROVIDE_HIDDEN (__preinit_array_end = .);
.init_array :
PROVIDE_HIDDEN (__init_array_start = .);
KEEP (*(SORT(.init_array.*)))
KEEP (*(.init_array*))
PROVIDE_HIDDEN (__init_array_end = .);
.fini_array :
PROVIDE_HIDDEN (__fini_array_start = .);
KEEP (*(.fini_array*))
KEEP (*(SORT(.fini_array.*)))
PROVIDE_HIDDEN (__fini_array_end = .);
. = ALIGN(4);
_sidata = LOADADDR(.data);
.data :
. = ALIGN(4);
_sdata = .;
. = ALIGN(4);
_edata = .;
.bss :
. = ALIGN(4);
_sbss = .;
__bss_start__ = _sbss;
. = ALIGN(4);
_ebss = .;
__bss_end__ = _ebss;
} > SRAM
libc.a ( * )
libm.a ( * )
libgcc.a ( * )
.ARM.attributes 0 : { *(.ARM.attributes) }
So why the adress stored in the isr_vector_table is 08000009 and not 08000008?
The only way I so far could change it to the right value was through hardcoding the value or defining a extra section for the Reset_Handler so I could use the adress as another extern value like the _stack_top.
Here are the commands I used for compilation as I don't know if they are necessary to find an answer:
cd C:/bare_metal
arm-none-eabi-gcc.exe -g main.c -o blink.elf -Wall -T STM32F4.ld -mcpu=cortex-m4 -mthumb --specs=nosys.specs -nostdlib -O0
arm-none-eabi-objdump.exe -D blink.elf

From the Programming Manual PM0214 of STM32F4:
Vector table
The vector table contains the reset value of the stack
pointer, and the start addresses, also called exception vectors, for
all exception handlers. Figure 11 on page 39 shows the order of the
exception vectors in the vector table. The least-significant bit of
each vector must be 1, indicating that the exception handler is Thumb
So, the LSb = 1 indicates that the instruction pointed by that vector is a Thumb instruction. Cortex-M cores support only Thumb instruction set. The compiler knows that, and makes LSb = 1 automatically. If you somehow manage to make it 0, it won't work.


LINKER script GCC how to avoid veneer call

I work on the project where I copy some functions to the RAM from FLASH and call them. Everything is OK except one small problem I have - if I call function directly the compiler adds the veneer call instead (which calls the funtion in the RAM correctly).
IF I call it via the pointer all is OK. The debugger shows that resolved address of the function is correct.
#define RAMFCALL(func, ...) {unsigned (* volatile fptr)() = (unsigned (* volatile)())func; fptr(__VA_ARGS__);}
RAMFCALL(FLASH_EraseSector, 0, 0);
and the corresponding calls:
311 RAMFCALL(FLASH_EraseSector, 0, 0);
0801738e: ldr r3, [pc, #88] ; (0x80173e8 <flashSTMInit+140>)
08017390: str r3, [sp, #12]
08017392: ldr r3, [sp, #12]
08017394: movs r1, #0
08017396: mov r0, r1
08017398: blx r3
312 FLASH_EraseSector(0,0);
0801739a: movs r1, #0
0801739c: mov r0, r1
0801739e: bl 0x801e9f0 <__FLASH_EraseSector_veneer>
Debugger shows the correct addresses.
and the corresponding part of the linker script
. = ALIGN(512);
RAM_functions_load = LOADADDR(.RAM_functions);
PROVIDE(RAM_VectorTable_start = .);
PROVIDE(RAM_VectorTable_end = .);
. = ALIGN(4);
RAM_functions_start = .;
RAM_functions_end = .;
. = ALIGN(4);
RAM_functionsDATA_start = .;
RAM_functionsDATA_end = .;
. = ALIGN(4);
RAM_functionsBUFFER_start = .;
/* used by the startup to initialize data */
/* Initialized data sections goes into RAM, load LMA copy after code */
. = ALIGN(4);
_sdata = .; /* create a global symbol at data start */
*(.data) /* .data sections */
*(.data*) /* .data* sections */
. = ALIGN(4);
_edata = .; /* define a global symbol at data end */
And again the question: how to remove the veneer call
I will answer myself as I have found the reason :)
The bl instruction is += 32MB relative to PC. I was calling the function in the RAM from FLASH and the actual distance was much longer than 32MB. So the linker had to place the veneer function call.
Veneers could be eliminated by giving the -mlong-calls argument to the compiler. Each call site becomes a bit longer loosing some performance, however it might still be better than loosing performance in the veneers.
Individual functions can also be marked to be called through registers by applying the long_call attribute ( ARM assumed based on the assembly, decribed at https://gcc.gnu.org/onlinedocs/gcc/ARM-Function-Attributes.html#ARM-Function-Attributes )

Bootloader for STM32F405 Not Jumping to Application

I have a very small bootloader sitting in front of the main firmware running on a custom-designed board based around the STM32F405VGT chip. It has a fairly minimally modified startup.s and linker files for both applications. The primary application runs fine when loaded into the root of the FLASH memory, but does not launch from the bootloader.
When stepping through the code, as soon as it tries to launch the app, the program ends up in the WWDG_IRQHandler, which is aliased to the Default_Handler and just sits and spins in the infinite loop (WWDG is disabled for the bootloader).
Bootloader Code:
uint32_t addr = 0x08010000;
/* Get the application stack pointer (First entry in the application vector table) */
uint32_t appStack = (uint32_t) *((__IO uint32_t*) addr);
/* Get the application entry point (Second entry in the application vector table) */
ApplicationEntryPoint entryPoint = (ApplicationEntryPoint)*((__IO uint32_t*)(addr + sizeof(uint32_t)));
/* would expect the value of entryPoint to be 0x802bc9c based on the values in the .map file as well as the actual values downloaded from the image using openocd. Instead, it comes back as 0x802bc9d, not sure if this is related to THUMB code */
/* Reconfigure vector table offset register to match the application location */
SCB->VTOR = addr;
/* Set the application stack pointer */
/* Start the application */
Here is the .ld file for the application:
/* Include memory map */
/* Uncomment this section to use the real memory map */
BOOTLOADER (rx) : ORIGIN = 0x08000000, LENGTH = 32K
USER_PROPS (rw) : ORIGIN = 0x08008000, LENGTH = 16K
SYS_PROPS (r) : ORIGIN = 0x0800C000, LENGTH = 16K
APP_CODE (rx) : ORIGIN = 0x08010000, LENGTH = 448K
SWAP (rx) : ORIGIN = 0x08070000, LENGTH = 384K
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 128K
BOOT_RAM (xrw) : ORIGIN = 0x2001E000, LENGTH = 8K
CCMRAM (rw) : ORIGIN = 0x10000000, LENGTH = 64K
/* Uncomment this section to load directly into root memory */
APP_CODE (rx) : ORIGIN = 0x08000000, LENGTH = 1024K
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 128K
MEMORY_B1 (rx) : ORIGIN = 0x60000000, LENGTH = 0K
CCMRAM (rw) : ORIGIN = 0x10000000, LENGTH = 64K
/* Highest address of the user mode stack */
_estack = 0x20020000; /* end of 128K RAM */
/* Entry Point */
/* Generate a link error if heap and stack don't fit into RAM */
_Min_Heap_Size = 0x000; /* required amount of heap (none) */
_Min_Stack_Size = 0x400; /* required amount of stack */
/* The startup code goes first into EEPROM */
.isr_vector :
. = ALIGN(4);
KEEP(*(.isr_vector)) /* Startup code */
. = ALIGN(4);
/* The program code and other data goes into EEPROM */
.text :
. = ALIGN(4);
*(.text) /* .text sections (code) */
*(.text*) /* .text* sections (code) */
*(.glue_7) /* glue arm to thumb code */
*(.glue_7t) /* glue thumb to arm code */
KEEP (*(.init))
KEEP (*(.fini))
. = ALIGN(4);
_etext = .; /* define a global symbols at end of code */
/* Constant data goes into EEPROM */
.rodata :
. = ALIGN(4);
*(.rodata) /* .rodata sections (constants, strings, etc.) */
*(.rodata*) /* .rodata* sections (constants, strings, etc.) */
. = ALIGN(4);
.preinit_array :
PROVIDE_HIDDEN (__preinit_array_start = .);
KEEP (*(.preinit_array*))
PROVIDE_HIDDEN (__preinit_array_end = .);
.init_array :
PROVIDE_HIDDEN (__init_array_start = .);
KEEP (*(SORT(.init_array.*)))
KEEP (*(.init_array*))
PROVIDE_HIDDEN (__init_array_end = .);
.fini_array :
PROVIDE_HIDDEN (__fini_array_start = .);
KEEP (*(SORT(.fini_array.*)))
KEEP (*(.fini_array*))
PROVIDE_HIDDEN (__fini_array_end = .);
/* used by the startup to initialize data */
_sidata = LOADADDR(.data);
/* Initialized data sections goes into RAM, load LMA copy after code */
.data :
. = ALIGN(4);
_sdata = .; /* create a global symbol at data start */
*(.data) /* .data sections */
*(.data*) /* .data* sections */
. = ALIGN(4);
_edata = .; /* define a global symbol at data end */
/* Uninitialized data section */
. = ALIGN(4);
.bss :
/* This is used by the startup in order to initialize the .bss secion */
_sbss = .; /* define a global symbol at bss start */
__bss_start__ = _sbss;
. = ALIGN(4);
_ebss = .; /* define a global symbol at bss end */
__bss_end__ = _ebss;
} >RAM
/* User_heap_stack section, used to check that there is enough RAM left */
._user_heap_stack :
. = ALIGN(8);
PROVIDE ( end = . );
PROVIDE ( _end = . );
. = . + _Min_Heap_Size;
. = . + _Min_Stack_Size;
. = ALIGN(8);
} >RAM
/* Remove information from the standard libraries */
libc.a ( * )
libm.a ( * )
libgcc.a ( * )
.ARM.attributes 0 : { *(.ARM.attributes) }
The bootloader .ld is identical, except all references to APP_CODE are replaced with BOOTLOADER
Here is the startup.s file for the bootloader. The startup.s file for the application is identical, except Boot_Reset_Handler is called Reset_Handler instead :
.syntax unified
.cpu cortex-m4
.fpu softvfp
.global g_pfnVectors
.global Default_Handler
/* start address for the initialization values of the .data section.
defined in linker script */
.word _sidata
/* start address for the .data section. defined in linker script */
.word _sdata
/* end address for the .data section. defined in linker script */
.word _edata
/* start address for the .bss section. defined in linker script */
.word _sbss
/* end address for the .bss section. defined in linker script */
.word _ebss
/* stack used for SystemInit_ExtMemCtl; always internal RAM used */
* #brief This is the code that gets called when the processor first
* starts execution following a reset event. Only the absolutely
* necessary set is performed, after which the application
* supplied main() routine is called.
* #param None
* #retval : None
.section .text.Boot_Reset_Handler
.weak Boot_Reset_Handler
.type Boot_Reset_Handler, %function
ldr sp, =_estack /* set stack pointer */
/* Copy the data segment initializers from flash to SRAM */
movs r1, #0
b LoopCopyDataInit
ldr r3, =_sidata
ldr r3, [r3, r1]
str r3, [r0, r1]
adds r1, r1, #4
ldr r0, =_sdata
ldr r3, =_edata
adds r2, r0, r1
cmp r2, r3
bcc CopyDataInit
ldr r2, =_sbss
b LoopFillZerobss
/* Zero fill the bss segment. */
movs r3, #0
str r3, [r2], #4
ldr r3, = _ebss
cmp r2, r3
bcc FillZerobss
/* Call the clock system intitialization function.*/
bl SystemInit
/* Call static constructors */
bl __libc_init_array
/* Call the application's entry point.*/
bl main
bx lr
.size Boot_Reset_Handler, .-Boot_Reset_Handler
* #brief This is the code that gets called when the processor receives an
* unexpected interrupt. This simply enters an infinite loop, preserving
* the system state for examination by a debugger.
* #param None
* #retval None
.section .text.Default_Handler,"ax",%progbits
b Infinite_Loop
.size Default_Handler, .-Default_Handler
.section .isr_vector,"a",%progbits
.type g_pfnVectors, %object
.size g_pfnVectors, .-g_pfnVectors
.word _estack
.word Boot_Reset_Handler
.word NMI_Handler
.word HardFault_Handler
.word MemManage_Handler
.word BusFault_Handler
.word UsageFault_Handler
.word 0
.word 0
.word 0
.word 0
.word SVC_Handler
.word DebugMon_Handler
.word 0
.word PendSV_Handler
.word SysTick_Handler
I want to point out that this is not a duplicate of Bootloader for Cortex M4 - Jump to loaded Application although the problem seems similar, the author of that post did not adequately explain how the problem was resolved.
Everything is built using standard gcc tools for embedded development.
I have used the following approach on various STM32 Cortex-M3 and M4 parts:
Given the following in-line assembly function:
__asm void boot_jump( uint32_t address )
LDR SP, [R0] ;Load new stack pointer address
LDR PC, [R0, #4] ;Load new program counter address
The bootloader switches to the application image thus:
// Switch off core clock before switching vector table
SysTick->CTRL = 0 ;
// Switch off any other enabled interrupts too
// Switch vector table
//Jump to start address
Where APPLICATION_START_ADDR is the base address of the application area (addr in your example); this address is the start of the application's vector table, which starts with the initial stack pointer and reset vector, the boot_jump() function loads these into the SP and PC registers to start the application as if it had been started at reset. The application's reset vector contains the application's execution start address.
The obvious difference between this and your solution is the disabling of any interrupt generators before switching the vector table. You may of course not be using any interrupts in the bootloader.

freeRTOS linking process: multiple definition of `_start'

I am trying to compile freeRTOS for raspberry pi 2. Those are the commands I tried so far:
arm-none-eabi-gcc -march=armv7-a -mcpu=cortex-a7 -mfpu=neon-vfpv4
-mfloat-abi=hard test.c -o test.o
arm-none-eabi-as -march=armv7-a -mcpu=cortex-a7 -mfpu=neon-vfpv4
-mfloat-abi=hard startup.s -o startup.o
arm-none-eabi-ld test.o startup.o -static -Map kernel7.map -o
target.elf -T raspberrypi.ld
The two upper ones do work fine. However the last one doesn't, it gives me the following error:
startup.o: In function _start':
(.init+0x0): multiple definition of_start'
test.o::(.text+0x6c): first defined here
startup.o: In function swi_handler':
(.init+0x28): undefined reference tovPortYieldProcessor'
startup.o: In function irq_handler':
(.init+0x38): undefined reference tovFreeRTOS_ISR'
startup.o: In function zero_loop':
(.init+0xcc): undefined reference torpi_cpu_irq_disable'
This is the corresponding code:
#include <stdio.h>
void exit(int code)
int main(void)
return 0;
.extern system_init
.extern __bss_start
.extern __bss_end
.extern vFreeRTOS_ISR
.extern vPortYieldProcessor
.extern rpi_cpu_irq_disable
.extern main
.section .init
.globl _start
;# All the following instruction should be read as:
;# Load the address at symbol into the program counter.
ldr pc,reset_handler ;# Processor Reset handler -- we will have to force this on the raspi!
;# Because this is the first instruction executed, of cause it causes an immediate branch into reset!
ldr pc,undefined_handler ;# Undefined instruction handler -- processors that don't have thumb can emulate thumb!
ldr pc,swi_handler ;# Software interrupt / TRAP (SVC) -- system SVC handler for switching to kernel mode.
ldr pc,prefetch_handler ;# Prefetch/abort handler.
ldr pc,data_handler ;# Data abort handler/
ldr pc,unused_handler ;# -- Historical from 26-bit addressing ARMs -- was invalid address handler.
ldr pc,irq_handler ;# IRQ handler
ldr pc,fiq_handler ;# Fast interrupt handler.
;# Here we create an exception address table! This means that reset/hang/irq can be absolute addresses
reset_handler: .word reset
undefined_handler: .word undefined_instruction
swi_handler: .word vPortYieldProcessor
prefetch_handler: .word prefetch_abort
data_handler: .word data_abort
unused_handler: .word unused
irq_handler: .word vFreeRTOS_ISR
fiq_handler: .word fiq
/* Disable IRQ & FIQ */
cpsid if
/* Check for HYP mode */
mrs r0, cpsr_all
and r0, r0, #0x1F
mov r8, #0x1A
cmp r0, r8
beq overHyped
b continueBoot
overHyped: /* Get out of HYP mode */
ldr r1, =continueBoot
msr ELR_hyp, r1
mrs r1, cpsr_all
and r1, r1, #0x1f ;# CPSR_MODE_MASK
orr r1, r1, #0x13 ;# CPSR_MODE_SUPERVISOR
msr SPSR_hyp, r1
;# In the reset handler, we need to copy our interrupt vector table to 0x0000, its currently at 0x8000
mov r0,#0x8000 ;# Store the source pointer
mov r1,#0x0000 ;# Store the destination pointer.
;# Here we copy the branching instructions
ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9} ;# Load multiple values from indexed address. ; Auto-increment R0
stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9} ;# Store multiple values from the indexed address. ; Auto-increment R1
;# So the branches get the correct address we also need to copy our vector table!
ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9} ;# Load from 4*n of regs (8) as R0 is now incremented.
stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9} ;# Store this extra set of data.
;# Set up the various STACK pointers for different CPU modes
mov r0,#0xD2
msr cpsr_c,r0
mov sp,#0x8000
mov r0,#0xD1
msr cpsr_c,r0
mov sp,#0x4000
mov r0,#0xD3
msr cpsr_c,r0
mov sp,#0x8000000
ldr r0, =__bss_start
ldr r1, =__bss_end
mov r2, #0
cmp r0,r1
it lt
strlt r2,[r0], #4
blt zero_loop
bl rpi_cpu_irq_disable
;# mov sp,#0x1000000
b main ;# We're ready?? Lets start main execution!
.section .text
b undefined_instruction
b prefetch_abort
b data_abort
b unused
b fiq
b hang
.globl PUT32
str r1,[r0]
bx lr
.globl GET32
ldr r0,[r0]
bx lr
.globl dummy
bx lr
* BlueThunder Linker Script for the raspberry Pi!
RESERVED (r) : ORIGIN = 0x00000000, LENGTH = 32K
INIT_RAM (rwx) : ORIGIN = 0x00008000, LENGTH = 32K
RAM (rwx) : ORIGIN = 0x00010000, LENGTH = 128M
* Our init section allows us to place the bootstrap code at address 0x8000
* This is where the Graphics processor forces the ARM to start execution.
* However the interrupt vector code remains at 0x0000, and so we must copy the correct
* branch instructions to 0x0000 - 0x001C in order to get the processor to handle interrupts.
.init : {
} > INIT_RAM = 0
.module_entries : {
__module_entries_start = .;
__module_entries_end = .;
__module_entries_size = SIZEOF(.module_entries);
* This is the main code section, it is essentially of unlimited size. (128Mb).
.text : {
} > RAM
* Next we put the data.
.data : {
} > RAM
.bss :
__bss_start = .;
__bss_end = .;
} > RAM
__exidx_start = .;
.ARM.exidx :
*(.ARM.exidx* .gnu.linkonce.armexidx.*)
} > RAM
__exidx_end = .;
* Place HEAP here???
PROVIDE(__HEAP_START = __bss_end );
* Stack starts at the top of the RAM, and moves down!
_estack = ORIGIN(RAM) + LENGTH(RAM);
As you can see test.c doesn't contain an entry point called _start, neither does it have one in its assembly compiled form. Only startup.s does.
Any idea's about how I could solve my current issue?
EDIT: all the code if needed used can be found here:https://github.com/jameswalmsley/RaspberryPi-FreeRTOS

Linker assigns improper LMA to a section (using AT>)

I have a simple asm file with 3 sections:
.code 32
.section sec1
MOV R3, #10
.section sec2
MOV R1, #10
.section sec3
MOV R2, #10
And a linker script:
ram : ORIGIN = 0x00200000, LENGTH = 1K
rom : ORIGIN = 0x00100000, LENGTH = 1K
.text :
.sec1 :
.sec2 :
_ram_start = .;
}>ram AT> rom
.sec3 :
}>ram AT> rom
.data :
.bss :
I assume that .sec2 should have VMA address set to ram's origin, but the LMA should be the address after .sec1, but objdump gives me:
test2.o: file format elf32-littlearm
Idx Name Size VMA LMA File off Algn
0 .sec1 00000004 00100000 00100000 00000034 2**0
1 .sec2 00000004 00200000 00200000 00000038 2**0
2 .sec3 00000004 00200004 00200004 0000003c 2**0
Why is the .sec2 LMA set to ram?
It turns out that my sections from the .s file were not allocatable. That's why the LMA was wrong. If the sections won't be allocated, the LMA can be the same as VMA. I've found it out while playing with objcopy - the output binary file was always empty. The asm file should look like this:
.code 32
.section sec1, "a"
MOV R3, #10
.section sec2, "a"
MOV R1, #10
.section sec3, "a"
MOV R2, #10
Normally the code would go to the .text section, which is allocatable by default. After adding "a" the linker produces proper LMA addresses.

How can I initialize the Raspberry properly?

I wrote a motor controller and I tested on a respberry pi using Arch Arm Linux distro, to calculate the control signal took ~0.4ms, so I thought I can make better if I'm using real time OS, so I started with ChibiOS, but there the runtime was ~2.5ms, first I used Crossfire cross compiler than I switch to linaro, with the linaro the runtime was a bit worse ~2.7ms. What can be the problem? Is there possible that I'm not initializing the HW in an optimal way?
* Stack pointers initialization.
ldr r0, =__ram_end__
/* Undefined */
mov sp, r0
ldr r1, =__und_stack_size__
sub r0, r0, r1
/* Abort */
mov sp, r0
ldr r1, =__abt_stack_size__
sub r0, r0, r1
/* FIQ */
mov sp, r0
ldr r1, =__fiq_stack_size__
sub r0, r0, r1
/* IRQ */
mov sp, r0
ldr r1, =__irq_stack_size__
sub r0, r0, r1
/* Supervisor */
mov sp, r0
ldr r1, =__svc_stack_size__
sub r0, r0, r1
/* System */
mov sp, r0
mov r0,#0x8000
mov r1,#0x0000
ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9}
stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9}
ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9}
stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9}
;# enable fpu
mrc p15, 0, r0, c1, c0, 2
orr r0,r0,#0x300000 ;# single precision
orr r0,r0,#0xC00000 ;# double precision
mcr p15, 0, r0, c1, c0, 2
mov r0,#0x40000000
fmxr fpexc,r0
mov r0, #0
ldr r1, =_bss_start
ldr r2, =_bss_end
And the memory setup:
__und_stack_size__ = 0x0004;
__abt_stack_size__ = 0x0004;
__fiq_stack_size__ = 0x0010;
__irq_stack_size__ = 0x0080;
__svc_stack_size__ = 0x0004;
__sys_stack_size__ = 0x0400;
__stacks_total_size__ = __und_stack_size__ + __abt_stack_size__ + __fiq_stack_size__ + __irq_stack_size__ + __svc_stack_size__ + __sys_stack_size__;
ram : org = 0x8000, len = 0x06000000 - 0x20
__ram_start__ = ORIGIN(ram);
__ram_size__ = LENGTH(ram);
__ram_end__ = __ram_start__ + __ram_size__;
. = 0;
.text : ALIGN(16) SUBALIGN(16)
_text = .;
} > ram
.ARM.extab : {*(.ARM.extab* .gnu.linkonce.armextab.*)} > ram
__exidx_start = .;
.ARM.exidx : {*(.ARM.exidx* .gnu.linkonce.armexidx.*)} > ram
__exidx_end = .;
.eh_frame_hdr : {*(.eh_frame_hdr)}
.eh_frame : ONLY_IF_RO {*(.eh_frame)}
. = ALIGN(4);
_etext = .;
_textdata = _etext;
.data :
_data = .;
. = ALIGN(4);
. = ALIGN(4);
. = ALIGN(4);
_edata = .;
} > ram
.bss :
_bss_start = .;
. = ALIGN(4);
. = ALIGN(4);
. = ALIGN(4);
_bss_end = .;
} > ram
PROVIDE(end = .);
_end = .;
__heap_base__ = _end;
__heap_end__ = __ram_end__ - __stacks_total_size__;
__main_thread_stack_base__ = __ram_end__ - __stacks_total_size__;
Where do I make the mistake(s)?
A long time ago (yes, that means somewhen in the previous millenium), I used the old PC Speaker pcsp device driver (a little more current patch here) to control stepper motors via a relay attached to the data lines of the parallel port.
Note that's not the same driver as the current pcspkr driver (which only writes to the actual speaker, not to the parallel port); the parallel-output-capable parts of pcsp were never ported to the 2.6 audio architecture.
The trick there is that the driver can register a (high-priority, if needed) interrupt routine that does the actual device register / IO port writes to change the line state. As a result, you simply ioctl() the sample rate to the driver, and then just asynchronously write "ramps" (of data signals to step up/down to/from a certain speed or to perform a number of steps) created in-memory - the driver will then spool them for you, without the need for additional timing-/scheduling-sensitive code.
In the end you got an 8bit digital signal on the parallel port data pins, with timing precision as high as your timer interrupt allows.
There were sufficient lines to drive a stepper; if you wanted to make it turn a given number of steps, you had to:
create a "ramp up" to speed it up from still to fastest
create a "rect wave" to keep it turning
create a "ramp down" to slow it down to still again
If the number of steps was small, write the whole thing in one go, other wise, write the ramp-up, then write as many of the rect-wave blocks as needed, then the ramp down. Although you'd program possibly thousands of steps in one go, you'd only write three blocks of mem a few kB each, and the driver's interrupt handler does the rest.
It sounded rather funny if you attached a resistor-array DAC convertor ;-)
The approach can be generalized to the RaspPI; from the interrupt routine, simply write a GPIO control register (on ARM, device regs are always memory mapped, so it's simply a memory access).
Decoupling the "ramp" / "control signal" generation from the timing-sensitive state change (the "control signal application", in effect) and delegating the latter to the interrupt part of a device driver allows to do such tasks with "normal" Linux.
Your timing precision, again, is limited by rate and jitter of your timer interrupt. The RaspPI is capable of running higher timer interrupt rates than an i386 was. I'm pretty sure 1ms isn't a challenge with this approach (it wasn't in 1995). The methodology depends, as said, on the ability to precreate the signal.
