I have a little function that writes values to HW using the volatile variable
void gige_rx_prepare(void) {
volatile uint hw_write;
// more code here
hw_write = 0x32;
}
The gcc version 4.7.3 (Altera 13.1 Build 162) flags this variable as set but unused even though, being a volatile, it facilitates writing of the HW registers.
I still would like to see this warning on any other variable. Is there a way to avoid this warning on volatile variables without resorting to setting gcc attributes for each volatile variable in the code?
Local variable is not a good representation of a h/w register and that's part of the reason why you see the warning.
Compiler complains (correctly) because hw_write is a local variable on the stack. In this case compiler does have enough data to infer that it's a pointless assignment. If it were a global variable or a pointer to a volatile uint, then there would be no warning as variable lifetime would not be limited by the scope of the function and thus it could've been used somewhere else.
Following examples compile without any warnings:
volatile int hw_write2; // h/w register
void gige_rx_prepare2(void) {
// more code here
hw_write2 = 0x32;
}
void gige_rx_prepare3(void) {
volatile int *hw_write3 = (void*)0x1234; // pointer to h/w register.
// more code here
*hw_write3 = 0x32;
}
Related
I'm writing my own kernel (using multiboot2) and have followed this tutorial to bring it into long-mode. I am now linking with the following C code:
void kernel_main()
{
*(uint64_t*) 0xb8000 = 0x2f592f412f4b2f4f;
}
This prints OKAY to the screen.
However, I now create a global variable called VGA_buffer that holds this memory address.
volatile static const void* VGA_buffer = 0xb8000;
void kernel_main()
{
*(uint64_t*) VGA_buffer = 0x2f592f412f4b2f4f;
}
The code is no longer working, OKAY is not appearing on the screen.
How do I fix this?
I think this is because my linker script is not including the global variable data. This is what I've got:
ENTRY(start)
SECTIONS
{
. = 1M;
.boot :
{
*(.multiboot_header)
}
.text :
{
*(.text)
}
}
I also tried adding the following with no luck:
...
.rodata :
{
*(.rodata)
}
.data :
{
*(.data)
}
.bss :
{
*(.bss)
}
I'm not very familiar with custom linker scripts so I really don't know what I'm doing, and I'm not sure if this is even the problem.
You need to both link the .data segment and execute some initialization code for it, in order to initialize VGA_buffer. Meaning that you need to ensure that some manner of "CRT" (C run-time) code executes .data initialization. If you run some "no ABI" version of the compiler, this part might not happen at all unless you write it yourself manually.
Casting away const and volatile qualifiers invokes undefined behavior. Not sure why you added the const in the first place.
volatile static void* VGA_buffer = 0xb8000; isn't valid C. See "Pointer from integer/integer from pointer without a cast" issues
Pedantically, always write storage class specifier at the start of the declaration. volatile static... is obsolete style C. Instead, always write static volatile...
Hours wasted because I was missing two characters: -c
When compiling my C code into kernel.o, I forgot to tell gcc that this was a compilation-only step and that no linking should be done. After adding the -c flag, everything worked!
And I got great hints on the way, but I never even noticed them:
I was wondering, when compiling my C code, why I had to put in -nostdlib to stop standard library from being linked... because I forgot the -c flag.
I was also wondering why it was complaining that some externs were not provided because it was trying to link at this step when it shouldn't have been.
I was looking at the documentation on the Atmel website and I came across this example where they explain some issues with reordering.
Here's the example code:
#define cli() __asm volatile( "cli" ::: "memory" )
#define sei() __asm volatile( "sei" ::: "memory" )
unsigned int ivar;
void test2( unsigned int val )
{
val = 65535U / val;
cli();
ivar = val;
sei();
}
In this example, they're implementing a critical region-like mechanism. The cli instruction disables interrupts and the sei instruction enables them. Normally, I would save the interrupt state and restore to that state, but I digress...
The problem which they note is that, with optimization enabled, the division on the first line actually gets moved to after the cli instruction. This can cause some issues when you're trying to be inside of the critical region for the shortest amount of time as possible.
How come this is possible if the cli() MACRO expands to inline asm which explicitly clobbers the memory? How is the compiler free to move things before or after this statement?
Also, I modified the code to include memory barriers before every statement in the form of __asm volatile("" ::: "memory"); and it doesn't seem to change anything.
I also removed the memory clobber from the cli() and sei() MACROs, and the generated code was identical.
Of course, if I declare the test2 function argument as volatile, there is no reordering, which I assume to be because volatile statements can't be reordered with respect to other volatile statements (which the inline asm technically is). Is my assumption correct?
Can volatile accesses be reordered with respect to volatile inline asm?
Can non-volatile accesses be reordered with respect to volatile inline asm?
What's weird is that Atmel claims they need the memory clobber just to enforce the ordering of volatile accesses with respect to the asm. That doesn't make any sense to me.
If the compiler barrier isn't the proper solution for this, then how could I go about preventing any outside code from "leaking" into the critical region?
If anyone could shed some light, I'd appreciate it.
Thanks
How come this is possible if the cli() MACRO expands to inline asm which explicitly clobbers the memory? How is the compiler free to move things before or after this statement?
This is due to implementation details of avr-gcc: The compiler's support library, libgcc, provides many functions written in assembly for performance; including functions for integer division like __udivmodhi4. Not all of these functions clobber all of the callee-used registers as specified by the avr-gcc ABI. In particular, __udivmodhi4 does not clobber the Z register.
avr-gcc makes use of this as follows: On machines without 16-bit division instruction like AVR, GCC would issue a library call instead of generating code for it inline. avr-gcc however pretends that the architecture does have such division instruction and models it as having an effect on processor registers just like the library call. Finally, after all code analyzes and optimizations, the avr backend prints this instruction as [R]CALL __udivmodhi4. Let's call this a transparent call, i.e. a call which the compiler analysis does not see.
Example
int div (int a, int b, volatile const __flash char *z)
{
int ab;
(void) *z;
asm volatile ("" : "+r" (a));
ab = a / b;
asm volatile ("" : "+r" (ab));
(void) *z;
return ab;
}
Compile this with avr-gcc -S -Os -mmcu=atmega8 ... to get assembly file *.s:
div:
movw r30,r20
lpm r18,Z
rcall __divmodhi4
movw r24,r22
lpm r18,Z
ret
Explanation
(void) *z reads one byte from flash, and in order to use lpm instruction, the address must be in the Z register accomplished by movw r30,r20. After reading via lpm, the compiler issues rcall __divmodhi4 to perform signed 16-bit division. If this was an ordinary (non-transparent) call, the compiler would know nothing about the internal working of the callee, but as the avr backend models the call by hand, the compiler knows that the instruction sequence does not change Z and hence may use Z again after the call without any further ado. This allows for better code generation due to less register pressure, in particular z need not be saved / restores around the division.
The asm just serves to order the code: It is volatile and hence must not be reordered against the volatile read *z. And the asm must not be reordered against the division because the asm changes a and ab – at least that's what we are pretending and telling the compiler by means of the constraints. (These variables are not actually changed, but that does not matter here.)
Also, I modified the code to include memory barriers before every statement in the form of __asm volatile("" ::: "memory"); and it doesn't seem to change anything.
The division does not touch memory (it's a transparent call without memory clobber) hence the compiler machinery may reorder it against memory clobber / accesses.
If you need a specific order, then you'll have to introduce artificial dependencies like in in my example above.
In order to tell apart ordinary calls from transparent ones, you can dump the generated assembly in the .s file be means of -save-temps -dp where -dp prints insn names:
void func0 (void);
int func1 (int a, int b)
{
return a / b;
}
void func2 (void)
{
func0();
}
Every call that's neither call_insn nor call_value_insn is a transparent call, *divmodhi4_call in this case:
func1:
rcall __divmodhi4 ; 17 [c=0 l=1] *divmodhi4_call
movw r24,r22 ; 18 [c=4 l=1] *movhi/0
ret ; 23 [c=0 l=1] return
func2:
rjmp func0 ; 5 [c=0 l=1] call_insn/3
Summary
I'm porting ST's USB OTG Library to a custom STM32F4 board using the latest version of Sourcery CodeBench Lite toolchain (GCC arm-none-eabi 4.7.2).
When I compile the code with -O0, the program runs fine. When I compile with -O1 or -O2 it fails. When I say fail, it just stops. No hard fault, nothing (Well, obviously there is something it's doing but I don't have a emulator to use to debug and find out, I'm sorry. My hard fault handler is not being called).
Details
I'm trying to make a call to the following function...
void USBD_Init(USB_OTG_CORE_HANDLE *pdev,
USB_OTG_CORE_ID_TypeDef coreID,
USBD_DEVICE *pDevice,
USBD_Class_cb_TypeDef *class_cb,
USBD_Usr_cb_TypeDef *usr_cb);
...but it doesn't seem to make it into the function body. (Is this a symptom of "stack-smashing"?)
The structures passed to this function have the following definitions:
typedef struct USB_OTG_handle
{
USB_OTG_CORE_CFGS cfg;
USB_OTG_CORE_REGS regs;
DCD_DEV dev;
}
USB_OTG_CORE_HANDLE , *PUSB_OTG_CORE_HANDLE;
typedef enum
{
USB_OTG_HS_CORE_ID = 0,
USB_OTG_FS_CORE_ID = 1
}USB_OTG_CORE_ID_TypeDef;
typedef struct _Device_TypeDef
{
uint8_t *(*GetDeviceDescriptor)( uint8_t speed , uint16_t *length);
uint8_t *(*GetLangIDStrDescriptor)( uint8_t speed , uint16_t *length);
uint8_t *(*GetManufacturerStrDescriptor)( uint8_t speed , uint16_t *length);
uint8_t *(*GetProductStrDescriptor)( uint8_t speed , uint16_t *length);
uint8_t *(*GetSerialStrDescriptor)( uint8_t speed , uint16_t *length);
uint8_t *(*GetConfigurationStrDescriptor)( uint8_t speed , uint16_t *length);
uint8_t *(*GetInterfaceStrDescriptor)( uint8_t speed , uint16_t *length);
} USBD_DEVICE, *pUSBD_DEVICE;
typedef struct _Device_cb
{
uint8_t (*Init) (void *pdev , uint8_t cfgidx);
uint8_t (*DeInit) (void *pdev , uint8_t cfgidx);
/* Control Endpoints*/
uint8_t (*Setup) (void *pdev , USB_SETUP_REQ *req);
uint8_t (*EP0_TxSent) (void *pdev );
uint8_t (*EP0_RxReady) (void *pdev );
/* Class Specific Endpoints*/
uint8_t (*DataIn) (void *pdev , uint8_t epnum);
uint8_t (*DataOut) (void *pdev , uint8_t epnum);
uint8_t (*SOF) (void *pdev);
uint8_t (*IsoINIncomplete) (void *pdev);
uint8_t (*IsoOUTIncomplete) (void *pdev);
uint8_t *(*GetConfigDescriptor)( uint8_t speed , uint16_t *length);
uint8_t *(*GetUsrStrDescriptor)( uint8_t speed ,uint8_t index, uint16_t *length);
} USBD_Class_cb_TypeDef;
typedef struct _USBD_USR_PROP
{
void (*Init)(void);
void (*DeviceReset)(uint8_t speed);
void (*DeviceConfigured)(void);
void (*DeviceSuspended)(void);
void (*DeviceResumed)(void);
void (*DeviceConnected)(void);
void (*DeviceDisconnected)(void);
}
USBD_Usr_cb_TypeDef;
I've tried to include all the source code relevant to this problem. If you want to see the entire source code you can download it here: http://www.st.com/st-web-ui/static/active/en/st_prod_software_internet/resource/technical/software/firmware/stm32_f105-07_f2_f4_usb-host-device_lib.zip
Solutions Attempted
I tried playing with #pragma GCC optimize ("O0"), __attribute__((optimize("O0"))), and declaring certain definitions as volatile, but nothing worked. I'd rather just modify the code to make it play nicely with the optimizer anyway.
Question
How can I modify this code to make it play nice with GCC's optimizer?
There doesn't seem to be anything wrong with the code you showed, so this answer will be more general.
What are typical errors with "close to hardware" code that works properly unoptimized and fails with higher optimization levels?
Think about the differences between -O0 and -O1/-O2: optimization strategies are - among others - loop unrolling (doesn't seem to be dangerous), attempting to hold values in registers as long as possible, dead code elimination and instruction reordering.
improved register usage typically leads to problems with higher optimization levels if hardware registers that can change anytime aren't declared volatileproperly (see PokyBrain's comment above). The optimized code will try to hold values in registers as long as possible resulting in your program failing to notice changes on the hardware side. Make sure to declare hardware registers volatile properly
dead code elimination will likely lead to problems if you need to read a hardware register to produce whatever effect on the hardware not known to the compiler and don't do anything with the value you just read. These hardware accesses might be optimized away if you don't declare the variable used for read access void properly (compiler should issue a warning, though). Make sure to cast dummy reads to (void)
instruction reordering: if you need to access different hardware registers in a certain sequence to produce the desired results and if you do that through pointers not related in any way otherwise, the compiler is free to reorder the resulting instructions as it sees fit (even if hardware registers are properly declared volatile). You will need to stray memory barriers into your code to enforce the required access sequence (__asm__ __volatile__(::: "memory");). Make sure to add memory barriers where needed.
Although unlikely, it might still be the case that you found a compiler bug. Optimization is not an easy job, especially when it comes close to hardware. It might be worth a peek into the gcc bug database.
If all this doesn't help, you sometimes just can't avoid to dig into the generated assembler code to make sure its doing what it is supposed to do.
Using Codesourcery arm-linux-eabi crosscompiler and have problems with the compiler not executing certain code because it thinks it's not used, especially for a systemcall. Is there any way to get around this?
For example this code does not initialize the variable.
unsigned int temp = 42;
asm volatile("mov R1, %0 :: "r" (temp));
asm volatile("swi 1");
In this case temp never get initialized to the value 42. However if I add a printk after the initialization, it gets initialized to the correct value 42. I tried with
unsigned int temp __attribute__ ((used)) = 42;
Still doesn't work but I get a warning message:
'used' attribute ignored [-Wattributes]
this is in the linux kernel code.
Any tips?
This is not the correct way to use inline assembly. As written, the two statements are separate, and there is no reason the compiler has to preserve any register values between the two. You need to either put both assembly instructions in the same inline assembly block, with proper input and output constraints, or you could do something like the following which allows the compiler to be more efficient:
register unsigned int temp __asm__("r1") = 42;
__asm__ volatile("swi 1" : : "r"(temp) : "memory");
(Note that I added memory to the clobber list; I'm not sure which syscall you're making, but if the syscall writes to any object in userspace, "memory" needs to be listed in the clobberlist.)
I've got some C code I'm targeting for an AVR. The code is being compiled with avr-gcc, basically the gnu compiler with the right backend.
What I'm trying to do is create a callback mechanism in one of my event/interrupt driven libraries, but I seem to be having some trouble keeping the value of the function pointer.
To start, I have a static library. It has a header file (twi_master_driver.h) that looks like this:
#ifndef TWI_MASTER_DRIVER_H_
#define TWI_MASTER_DRIVER_H_
#define TWI_INPUT_QUEUE_SIZE 256
// define callback function pointer signature
typedef void (*twi_slave_callback_t)(uint8_t*, uint16_t);
typedef struct {
uint8_t buffer[TWI_INPUT_QUEUE_SIZE];
volatile uint16_t length; // currently used bytes in the buffer
twi_slave_callback_t slave_callback;
} twi_global_slave_t;
typedef struct {
uint8_t slave_address;
volatile twi_global_slave_t slave;
} twi_global_t;
void twi_init(uint8_t slave_address, twi_global_t *twi, twi_slave_callback_t slave_callback);
#endif
Now the C file (twi_driver.c):
#include <stdint.h>
#include "twi_master_driver.h"
void twi_init(uint8_t slave_address, twi_global_t *twi, twi_slave_callback_t slave_callback)
{
twi->slave.length = 0;
twi->slave.slave_callback = slave_callback;
twi->slave_address = slave_address;
// temporary workaround <- why does this work??
twi->slave.slave_callback = twi->slave.slave_callback;
}
void twi_slave_interrupt_handler(twi_global_t *twi)
{
(twi->slave.slave_callback)(twi->slave.buffer, twi->slave.length);
// some other stuff (nothing touches twi->slave.slave_callback)
}
Then I build those two files into a static library (.a) and construct my main program (main.c)
#include
#include
#include
#include
#include "twi_master_driver.h"
// ...define microcontroller safe way for mystdout ...
twi_global_t bus_a;
ISR(TWIC_TWIS_vect, ISR_NOBLOCK)
{
twi_slave_interrupt_handler(&bus_a);
}
void my_callback(uint8_t *buf, uint16_t len)
{
uint8_t i;
fprintf(&mystdout, "C: ");
for(i = 0; i < length; i++)
{
fprintf(&mystdout, "%d,", buf[i]);
}
fprintf(&mystdout, "\n");
}
int main(int argc, char **argv)
{
twi_init(2, &bus_a, &my_callback);
// ...PMIC setup...
// enable interrupts.
sei();
// (code that causes interrupt to fire)
// spin while the rest of the application runs...
while(1){
_delay_ms(1000);
}
return 0;
}
I carefully trigger the events that cause the interrupt to fire and call the appropriate handler. Using some fprintfs I'm able to tell that the location assigned to twi->slave.slave_callback in the twi_init function is different than the one in the twi_slave_interrupt_handler function.
Though the numbers are meaningless, in twi_init the value is 0x13b, and in twi_slave_interrupt_handler when printed the value is 0x100.
By adding the commented workaround line in twi_driver.c:
twi->slave.slave_callback = twi->slave.slave_callback;
The problem goes away, but this is clearly a magic and undesirable solution. What am I doing wrong?
As far as I can tell, I've marked appropriate variables volatile, and I've tried marking other portions volatile and removing the volatile markings. I came up with the workaround when I noticed removing fprintf statements after the assignment in twi_init caused the value to be read differently later on.
The problem seems to be with how I'm passing around the function pointer -- and notably the portion of the program that is accessing the value of the pointer (the function itself?) is technically in a different thread.
Any ideas?
Edits:
resolved typos in code.
links to actual files: http://straymark.com/code/ [test.c|twi_driver.c|twi_driver.h]
fwiw: compiler options: -Wall -Os -fpack-struct -fshort-enums -funsigned-char -funsigned-bitfields -mmcu=atxmega128a1 -DF_CPU=2000000UL
I've tried the same code included directly (rather than via a library) and I've got the same issue.
Edits (round 2):
I removed all the optimizations, without my "workaround" the code works as expected. Adding back -Os causes an error. Why is -Os corrupting my code?
Just a hunch, but what happens if you switch these two lines around:
twi->slave.slave_callback = slave_callback;
twi->slave.length = 0;
Does removing the -fpack-struct gcc flag fix the problem? I wonder if you haven't stumbled upon a bug where writing that length field is overwriting part of the callback value.
It looks to me like with the -Os optimisations on (you could try combinations of the individual optimisations enabled by -Os to see exactly which one is causing it), the compiler isn't emitting the right code to manipulate the uint16_t length field when its not aligned on a 2-byte boundary. This happens when you include a twi_global_slave_t inside a twi_global_t that is packed, because the initial uint8_t member of twi_global_t causes the twi_global_slave_t struct to be placed at an odd address.
If you make that initial field of twi_global_t a uint16_t it will probably fix it (or you could turn off struct packing). Try the latest gcc build and see if it still happens - if it does, you should be able to create a minimal test case that shows the problem, so you can submit a bug report to the gcc project.
This really sounds like a stack/memory corruption issue. If you run avr-size on your elf file, what do you get? Make sure (data + bss) < the RAM you have on the part. These types of issues are very difficult to track down. The fact that removing/moving unrelated code changes the behavior is a big red flag.
Replace "&my_callback" with "my_callback" in function main().
Because different threads access the callback address, try protecting it with a mutex or read-write lock.
If the callback function pointer isn't accessed by a signal handler, then the "volatile" qualifier is unnecessary.