Can't erase first page of samd21 microcontroller flash memory - arm

I'm using an Atmel samd21 microcontroller which has a cortex m0+ core. To bootload a new program, I want to erase the vector table located at address 0x0000 0000 and write new data there. I'm encountering two problems:
Doing an erase on address 0 does not appear to actually erase anything
I get a hard fault when I try to write to address 0
I'm going to try changing the address of VTOR and see if that gets me anywhere (edit: it didn't help). Aside from that, does anybody know if:
There is a way to tell the microcontroller "hey, I know what I'm doing, let me poke this address"
Is the hardfault upon writing to address 0 something that is defined in the Cortex m0+ spec (I couldn't find anything), or is it implementation defined behavior?
edit
My bootloader resides at max_flash - 0x1000. I realize that this might not be the best thing, so I will probably change things so that the bootloader (with its own vector table) resides at 0x0000. I'd still like to know why I can't write to address 0x0000 though. There's nothing in the cortex m0+ documentation that suggests that I shouldn't be able to do that.
I've checked the following things:
Are my interrupts disabled? (they are, I've got an __asm__ volatile("cpsid if"); right before I start poking memory)
Does changing the value of VTOR make a difference? (it doesn't)
Is the flash page I'm trying to erase "locked" by BOOTPROT? (it isn't, BOOTPROT = 7.)
Are any regions listed as locked in the LOCK register? (they aren't, LOCK = 0xffff)
Is it executing from the page it's trying to erase? (nope, the back-trace says that pc = 0xf1dc before the hard fault happens.
Any other things to check?

Sadly Atmel (can't blame this on microchip this happened before they were assimilated) went away from their built in SAM-BA bootloader. Instead they offer a software/source version which you can place yourself and some extra controls to somewhat protect that space but no protection on the protection so it is trivial for a program to unlock and erase or do damage. So better off either just making your own bootloader, one simpler and easier to maintain, (and try not to erase it) or use the SWD interface which is required as an only solution on competing products if not an alternate. I eventually went with the latter.
What I found was not only was it easy to erase and write over that space, it was scary easy, once some magic was unlocked you could do simple stores to the space intentionally or accidentally to trash/overwrite.
I'll post my code take it or leave it, been a while since I read the datasheet other than today to see that you should search for BOOTPROT to see about those protection bits and what register to change to disable that protection (if it is even on).
The puts and gets are just store and load instruction abstractions.
//------------------------------------------------------------------------
//------------------------------------------------------------------------
#include "flash-bin.h"
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
void PUT16 ( unsigned int, unsigned int );
unsigned int GET16 ( unsigned int );
void PUT8 ( unsigned int, unsigned int );
unsigned int GET8 ( unsigned int );
void dummy ( unsigned int );
#define PORT_BASE 0x41004400
#define PORTA_DIRSET (PORT_BASE+0x00+0x08)
#define PORTA_OUTCLR (PORT_BASE+0x00+0x14)
#define PORTA_OUTSET (PORT_BASE+0x00+0x18)
#define PORTA_OUTTGL (PORT_BASE+0x00+0x1C)
#define PORTB_DIRSET (PORT_BASE+0x80+0x08)
#define PORTB_OUTCLR (PORT_BASE+0x80+0x14)
#define PORTB_OUTSET (PORT_BASE+0x80+0x18)
#define PORTB_OUTTGL (PORT_BASE+0x80+0x1C)
#define PORTA_PMUX05 (PORT_BASE+0x00+0x30+5)
#define PORTA_PINCFG10 (PORT_BASE+0x00+0x40+10)
#define PORTA_PINCFG11 (PORT_BASE+0x00+0x40+11)
#define PORTB_PMUX01 (PORT_BASE+0x80+0x30+1)
#define PORTB_PMUX11 (PORT_BASE+0x80+0x30+11)
#define PORTB_PINCFG03 (PORT_BASE+0x80+0x40+3)
#define PORTB_PINCFG22 (PORT_BASE+0x80+0x40+22)
#define PORTB_PINCFG23 (PORT_BASE+0x80+0x40+23)
#define GCLK_BASE 0x40000C00
#define GCLK_CTRL (GCLK_BASE+0x00)
#define GCLK_STATUS (GCLK_BASE+0x01)
#define GCLK_CLKCTRL (GCLK_BASE+0x02)
#define GCLK_GENCTRL (GCLK_BASE+0x04)
#define GCLK_GENDIV (GCLK_BASE+0x08)
#define PM_BASE 0x40000400
#define APBCMASK (PM_BASE+0x20)
#define SYSCTRL_BASE 0x40000800
#define OSC8M (SYSCTRL_BASE+0x20)
#define SERCOM5_BASE 0x42001C00
#define SERCOM5_CTRLA (SERCOM5_BASE+0x00)
#define SERCOM5_CTRLB (SERCOM5_BASE+0x04)
#define SERCOM5_BAUD (SERCOM5_BASE+0x0C)
#define SERCOM5_INTFLAG (SERCOM5_BASE+0x18)
#define SERCOM5_SYNCBUSY (SERCOM5_BASE+0x1C)
#define SERCOM5_DATA (SERCOM5_BASE+0x28)
#define SERCOM0_BASE 0x42000800
#define SERCOM0_CTRLA (SERCOM0_BASE+0x00)
#define SERCOM0_CTRLB (SERCOM0_BASE+0x04)
#define SERCOM0_BAUD (SERCOM0_BASE+0x0C)
#define SERCOM0_INTFLAG (SERCOM0_BASE+0x18)
#define SERCOM0_SYNCBUSY (SERCOM0_BASE+0x1C)
#define SERCOM0_DATA (SERCOM0_BASE+0x28)
#define STK_CSR 0xE000E010
#define STK_RVR 0xE000E014
#define STK_CVR 0xE000E018
#define STK_MASK 0x00FFFFFF
#define ACTLR 0xE000E008
#define CPUID 0xE000ED00
#define NVMCTRL_BASE 0x41004000
#define NVM_CTRLA (NVMCTRL_BASE+0x00)
#define NVM_CTRLB (NVMCTRL_BASE+0x04)
#define NVM_PARAM (NVMCTRL_BASE+0x08)
#define NVM_INTFLAG (NVMCTRL_BASE+0x14)
#define NVM_STATUS (NVMCTRL_BASE+0x18)
#define NVM_ADDR (NVMCTRL_BASE+0x1C)
#define NVM_LOCK (NVMCTRL_BASE+0x20)
//------------------------------------------------------------------------
static void clock_init ( void )
{
unsigned int ra;
ra=GET32(OSC8M);
ra&=~(3<<8);
PUT32(OSC8M,ra);
}
//------------------------------------------------------------------------
#ifdef USE_SERCOM0
//TX PA10 SERCOM0 PAD[2] FUNCTION C SERCOM2 PAD[2] FUNCTION D
//RX PA11 SERCOM0 PAD[3] FUNCTION C SERCOM2 PAD[3] FUNCTION D
//------------------------------------------------------------------------
static void uart_init ( void )
{
unsigned int ra;
ra=GET32(APBCMASK);
ra|=1<<2; //enable SERCOM0
PUT32(APBCMASK,ra);
PUT32(GCLK_GENCTRL,0x00010605);
PUT16(GCLK_CLKCTRL,0x4514);
PUT8(PORTA_PINCFG10,0x01);
PUT8(PORTA_PINCFG11,0x01);
PUT8(PORTA_PMUX05,0x22);
while(GET32(SERCOM0_SYNCBUSY)) continue;
PUT32(SERCOM0_CTRLA,0x00000000);
while(GET32(SERCOM0_SYNCBUSY)) continue;
PUT32(SERCOM0_CTRLA,0x00000001);
while(GET32(SERCOM0_SYNCBUSY)) continue;
PUT32(SERCOM0_CTRLA,0x40310004);
while(GET32(SERCOM0_SYNCBUSY)) continue;
PUT32(SERCOM0_CTRLB,0x00030000);
while(GET32(SERCOM0_SYNCBUSY)) continue;
PUT16(SERCOM0_BAUD,50436);
while(GET32(SERCOM0_SYNCBUSY)) continue;
PUT32(SERCOM0_CTRLA,0x40310006);
while(GET32(SERCOM0_SYNCBUSY)) continue;
}
//------------------------------------------------------------------------
//static void uart_flush ( void )
//{
//while(1)
//{
//if(GET8(SERCOM0_INTFLAG)&2) break;
//}
//}
//------------------------------------------------------------------------
static void uart_send ( unsigned int d )
{
while(1)
{
if(GET8(SERCOM0_INTFLAG)&1) break;
}
PUT8(SERCOM0_DATA,d&0xFF);
}
//------------------------------------------------------------------------
//static unsigned int uart_recv ( void )
//{
//while(1)
//{
//if(GET8(SERCOM0_INTFLAG)&4) break;
//}
//return(GET8(SERCOM0_DATA)&0xFF);
//}
//------------------------------------------------------------------------
#else
//TX PB22 SERCOM5 PAD[2] PORT FUNCTION D
//RX PB23 SERCOM5 PAD[3] PORT FUNCTION D
//------------------------------------------------------------------------
static void uart_init ( void )
{
unsigned int ra;
ra=GET32(APBCMASK);
ra|=1<<7; //enable SERCOM5
ra|=1<<2; //enable SERCOM0
PUT32(APBCMASK,ra);
PUT32(GCLK_GENCTRL,0x00010605);
PUT16(GCLK_CLKCTRL,0x4519);
PUT8(PORTB_PINCFG22,0x01);
PUT8(PORTB_PINCFG23,0x01);
PUT8(PORTB_PMUX11,0x33);
while(GET32(SERCOM5_SYNCBUSY)) continue;
PUT32(SERCOM5_CTRLA,0x00000000);
while(GET32(SERCOM5_SYNCBUSY)) continue;
PUT32(SERCOM5_CTRLA,0x00000001);
while(GET32(SERCOM5_SYNCBUSY)) continue;
PUT32(SERCOM5_CTRLA,0x40310004);
while(GET32(SERCOM5_SYNCBUSY)) continue;
PUT32(SERCOM5_CTRLB,0x00030000);
while(GET32(SERCOM5_SYNCBUSY)) continue;
PUT16(SERCOM5_BAUD,50436);
while(GET32(SERCOM5_SYNCBUSY)) continue;
PUT32(SERCOM5_CTRLA,0x40310006);
while(GET32(SERCOM5_SYNCBUSY)) continue;
}
//------------------------------------------------------------------------
//static void uart_flush ( void )
//{
//while(1)
//{
//if(GET8(SERCOM5_INTFLAG)&2) break;
//}
//}
//------------------------------------------------------------------------
static void uart_send ( unsigned int d )
{
while(1)
{
if(GET8(SERCOM5_INTFLAG)&1) break;
}
PUT8(SERCOM5_DATA,d&0xFF);
}
//------------------------------------------------------------------------
//static unsigned int uart_recv ( void )
//{
//while(1)
//{
//if(GET8(SERCOM5_INTFLAG)&4) break;
//}
//return(GET8(SERCOM5_DATA)&0xFF);
//}
//------------------------------------------------------------------------
#endif
//------------------------------------------------------------------------
static void hexstrings ( unsigned int d )
{
//unsigned int ra;
unsigned int rb;
unsigned int rc;
rb=32;
while(1)
{
rb-=4;
rc=(d>>rb)&0xF;
if(rc>9) rc+=0x37; else rc+=0x30;
uart_send(rc);
if(rb==0) break;
}
uart_send(0x20);
}
//------------------------------------------------------------------------
static void hexstring ( unsigned int d )
{
hexstrings(d);
uart_send(0x0D);
uart_send(0x0A);
}
//------------------------------------------------------------------------
static void flash_busy ( void )
{
while(1)
{
if(GET8(NVM_INTFLAG)&(1<<0)) break;
}
}
//------------------------------------------------------------------------
static void flash_command ( unsigned int cmd )
{
PUT16(NVM_CTRLA,0xA500+cmd);
flash_busy();
}
//------------------------------------------------------------------------
#define FLASH_ER 0x02
#define FLASH_WP 0x04
#define FLASH_UR 0x41
#define FLASH_PBC 0x44
#define FLASH_INVALL 0x46
//------------------------------------------------------------------------
int notmain ( void )
{
unsigned int ra;
unsigned int addr;
unsigned int page_size;
unsigned int row_size;
unsigned int pages;
unsigned int rows;
clock_init();
uart_init();
hexstring(0x12345678);
hexstring(GET32(ACTLR));
hexstring(GET32(CPUID));
hexstring(GET32(NVM_PARAM));
ra=GET32(NVM_PARAM);
pages=ra&0xFFFF;
page_size=(ra>>16)&7;
page_size=8<<page_size;
row_size=page_size<<2;
rows=pages>>2;
hexstring(pages);
hexstring(page_size);
hexstring(rows);
hexstring(row_size);
flash_busy();
flash_command(FLASH_INVALL); //where do you use this if at all?
for(addr=0x0000;addr<0x8000;addr+=0x100)
{
hexstrings(addr); hexstring(GET8(NVM_INTFLAG));
PUT32(NVM_ADDR,addr);
flash_command(FLASH_UR); //unlock
flash_command(FLASH_ER); //erase row
}
for(ra=0x0000;ra<0x0040;ra+=4)
{
hexstrings(ra); hexstring(GET32(ra));
}
if(1)
{
flash_command(FLASH_INVALL); //where do you use this if at all?
flash_command(FLASH_PBC); //page buffer clear
for(addr=0x0000,ra=0;ra<(0x800>>2);ra++,addr+=4)
{
if((addr&0x3F)==0) hexstring(addr);
PUT32(addr,rom[ra]);
if((addr&0x3F)==0x3C) flash_busy();
}
for(ra=0x0000;ra<0x0040;ra+=4)
{
hexstrings(ra); hexstring(GET32(ra));
}
}
return(0);
}
//------------------------------------------------------------------------
//------------------------------------------------------------------------
Since pretty much all useful SWD/JTAG debuggers allow you to download and run a program in ram, but not all have built into the debugger support for all the programming the flash nuances, my personal preference is to either have a program in ram that carries a payload which is the flash program and it programs it in application, this way any debugger can be used. That or I write a bootloader if there isn't enough ram for both or use the ram program to burn the bootloader then the bootloader to burn bigger apps.
Edit
sram.s
.cpu cortex-m0
.thumb
.thumb_func
.global _start
_start:
ldr r0,stacktop
mov sp,r0
bl notmain
b hang
.thumb_func
hang: b .
.align
stacktop: .word 0x20001000
.thumb_func
.globl PUT8
PUT8:
strb r1,[r0]
bx lr
.thumb_func
.globl PUT16
PUT16:
strh r1,[r0]
bx lr
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.thumb_func
.globl GET8
GET8:
ldrb r0,[r0]
bx lr
.thumb_func
.globl GET16
GET16:
ldrh r0,[r0]
bx lr
.thumb_func
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.thumb_func
.globl dummy
dummy:
bx lr
.end
sram.ld
MEMORY
{
ram : ORIGIN = 0x20000000, LENGTH = 0xD00
}
SECTIONS
{
.text : { *(.text*) } > ram
.rodata : { *(.rodata*) } > ram
.bss : { *(.bss*) } > ram
}

Related

Is this the right way to access function?

I am currently using "STM32F429I-DISC1" with joystick. I am trying to draw something on the LCD screen and using joystick move this object. My drawing is working fine, but I have the error: " void value not ignored as it ought to be".
This two lines have problems...
localX = Joy_ReadXY(CTRL_REG_IN3);
localY = Joy_ReadXY(CTRL_REG_IN4);
Can someone please tell me, how I can fix this error?
And why I see this error?
Main.c
#include "stm32f429i_discovery_lcd.h"
#define CTRL_REG_IN3 0b00011000
#define CTRL_REG_IN4 0b00100000
SemaphoreHandle_t xMutex;
Joystick_data xy;
void vTaskFunction1(void *pvParameters) {
uint16_t localX;
uint16_t localY;
for(;;) {
localX = Joy_ReadXY(CTRL_REG_IN3);
localY = Joy_ReadXY(CTRL_REG_IN4);
xSemaphoreTake( xMutex, portMAX_DELAY );
xy.x = localX;
xy.y = localY;
xSemaphoreGive( xMutex );
HAL_Delay(10);
}
}
void vTaskFunction2(void *pvParameters) {
uint32_t xCoord = 240/2;
uint32_t yCoord = 320/2;
uint8_t reads = 0;
uint8_t ballRadius = 5;
uint16_t xLimitMin = ballRadius+25;
uint16_t xLimitMax = 240-ballRadius-25;
uint16_t yLimitMin = ballRadius+25;
uint16_t yLimitMax = 320-ballRadius-25;
for(;;) {
xSemaphoreTake( xMutex, portMAX_DELAY );
if (xy.x > 3000 && !(xCoord < xLimitMin))
xCoord -= 5;
if (xy.x < 1000 && !(xCoord > xLimitMax))
xCoord += 5;
if (xy.y > 3000 && !(yCoord < yLimitMin))
yCoord -= 5;
if (xy.y < 1000 && !(yCoord > yLimitMax))
yCoord += 5;
reads++;
BSP_LCD_Clear(LCD_COLOR_WHITE);
BSP_LCD_DrawCircle(xCoord, yCoord, ballRadius);
BSP_LCD_FillCircle(xCoord, yCoord, ballRadius);
xSemaphoreGive(xMutex);
HAL_Delay(20);
}
}
int main(void)
{
HAL_Init();
SystemClock_Config();
MX_GPIO_Init();
MX_SPI4_Init();
MX_TIM1_Init();
MX_USART1_UART_Init();
// LCD Things
BSP_LCD_Init();
BSP_LCD_LayerDefaultInit(1, LCD_FRAME_BUFFER);
BSP_LCD_SelectLayer(1);
BSP_LCD_SetBackColor(LCD_COLOR_WHITE); // Vali meelepärane värv
BSP_LCD_Clear(LCD_COLOR_WHITE);
BSP_LCD_SetTextColor(LCD_COLOR_DARKBLUE); // Vali meelepärane värv
MX_FREERTOS_Init();
if ( xMutex == NULL )
{
xMutex = xSemaphoreCreateMutex();
if ( ( xMutex ) != NULL )
xSemaphoreGive( ( xMutex ) );
}
xTaskCreate(vTaskFunction1, "Task 1", 100, NULL, 1, NULL);
xTaskCreate(vTaskFunction2, "Task 2", 100, NULL, 1, NULL);
vTaskStartScheduler();
osKernelStart();
while (1)
{
}
}
Read joystick function (joystick.c)
#include <stdio.h>
#include <main.h>
#include "gpio.h"
#include "spi.h"
#define READ_SLAVE_OPERATION 0b10000000
#define READ_INCR_SLAVE_OPERATION 0b11000000
#define WRITE_SLAVE_OPERATION 0b00000000
#define CTRL_REG_IN3 0b00000011
#define CTRL_REG_IN4 0b00000100
#define OUT_X_L 0x28
#define OUT_X_H 0x29
#define OUT_Y_L 0x2A
#define OUT_Y_H 0x2B
#define OUT_Z_L 0x2C
#define OUT_Z_H 0x2D
#define JOY_CS_LOW() HAL_GPIO_WritePin(JOY_CS_GPIO_PORT, JOY_CS_PIN, 0)
#define JOY_CS_HIGH() HAL_GPIO_WritePin(JOY_CS_GPIO_PORT, JOY_CS_PIN, 1)
#define JOY_CS_GPIO_PORT GPIOC
#define JOY_CS_PIN GPIO_PIN_13
int16_t Joy_ReadXY(uint8_t reg1){
uint8_t pTxData1[2] = {reg1, 0};
uint8_t pRxData1[2] = {0, 0};
JOY_CS_LOW();
HAL_SPI_TransmitReceive(&hspi4, pTxData1, pRxData1, 2, HAL_MAX_DELAY);
JOY_CS_HIGH();
return pRxData1[0] << 8 | pRxData1[1];
}
Here, in Main.c, you call the function before telling the compiler about what parameters and what return value types it has.
localX = Joy_ReadXY(CTRL_REG_IN3);
localY = Joy_ReadXY(CTRL_REG_IN4)
That confused the compiler and it starts "guessing" about them.
Guessing that it is a void-returning function, the compiler then complains that you are expecting a return value from a function which does return void i.e. nothing.
The returned void should be ignored, instead of attempting to write it to a variable. At least that is what the compiler thinks...
To fix it, you should explain to the compiler that there is a function elsewhere, with name, parameters and return value type. That is done by providing the prototype
int16_t Joy_ReadXY(uint8_t reg1);
It needs to be done before the function body in which the the extern function is first called. (And you already confirmed in comments that it fixes the described problem in your code.)
Note that for the other shown functions this is not needed, because they are defined (with head and body) before they are called.
Similar for other functions, which have their prototype provided in the header you include early on.
Actually, putting the prototype of your function into a header and including that similarily would be the best way to solve this.

exchange function in system call table in x86

i am trying to redefine an systemcall for sys_open and to track user behaviour with that.
I use a Linux Kernel 4.13.0-041300.
This is my code so far
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/syscalls.h>
MODULE_LICENSE ("GPL");
//this is where the original sys_open call position will be saved
asmlinkage long (*original_open)(const char __user *filename, int flags, umode_t mode);
unsigned long **sys_call_table;
//this is to track how often my replaced function was called...
static int zeug = 0;
//this is my open function that i want to be replaced in the sys_call_table
asmlinkage long replaced_open(const char __user *filename, int flags, umode_t mode)
{
printk ("replaced wurde aufgerufen...\n");
zeug++;
return original_open(filename, flags, mode);
}
static void enable_page_protection(void)
{
unsigned long value;
asm volatile("mov %%cr0, %0" : "=r" (value));
if((value & 0x00010000))
return;
asm volatile("mov %0, %%cr0" : : "r" (value | 0x00010000));
}
static void disable_page_protection(void)
{
unsigned long value;
asm volatile("mov %%cr0, %0" : "=r" (value));
if(!(value & 0x00010000))
return;
asm volatile("mov %0, %%cr0" : : "r" (value & ~0x00010000));
}
//the function to get the system_call_table
static unsigned long **aquire_sys_call_table(void)
{
unsigned long int offset = PAGE_OFFSET;
unsigned long **sct;
while (offset < ULLONG_MAX) {
sct = (unsigned long **)offset;
if (sct[__NR_close] == (unsigned long *) sys_close)
return sct;
offset += sizeof(void *);
}
return NULL;
}
static int __init minit (void)
{
printk ("minit: startet...\n");
if(!(sys_call_table = aquire_sys_call_table()))
return -1;
printk ("minit: sys_call_table ersetzt...\n");
disable_page_protection();
{
//here i print the function name of the current function in sys_call_table
printk ("minit: eintrag vor ersetzen:%pF\n", sys_call_table[__NR_open]);
//here i store the real sys_open function and change to my func
original_open =(void * )xchg(&sys_call_table[__NR_open],(unsigned long *)replaced_open);
}
enable_page_protection();
return 0;
}
static void mexit (void)
{
printk ("mexit gestartet.\n");
printk ("Open was called %d times...\n",zeug);
if(!sys_call_table) return;
//here i print the stored function again
printk ("bei exit:%pF\n", sys_call_table[__NR_open]);
disable_page_protection();
{
//change back to original sys_open function
xchg(&sys_call_table[__NR_open], (unsigned long *)original_open);
}
printk ("nach zurücksetzen:%pF\n", sys_call_table[__NR_open]);
enable_page_protection();
}
module_init(minit);
module_exit(mexit);
My plan:
After insmod this module to kernel, every systemcall of sys_open will be "redirected" to my function replaced_open. This function will count its calls and then call the original open function.
After rmmod of my module the original system_call open will be used again.
It seems, that the replacing works. So after insmmod I get the result replaced_open+0x0/0x40 [kroot].
That means, the original function sys_open was replaced to my replaced_open right?
and after removing my module I get the message SyS_open+0x0/0x20.
So it seems like replacing works.
My problem is: I don't see any printed messages from my replaced_open function. Also it seems that counting doesn't work.
It feels like the function wasn't replaced properly.
Do you have any help for me?

achieve GCC cas function for version 4.1.2 and earlier

My new company project, they want the code run for the 32-bit, the compile server is a CentOS 5.0 with GCC 4.1.1, that was the nightmare.
There are lots of functions using in the project like __sync_fetch_and_add was given in GCC 4.1.2 and later.
I was told can not upgrade GCC version, so I have to make another solution after Googling for several hours.
When I wrote a demo to test, I just got the wrong answer, the code blow want to replace function __sync_fetch_and_add
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
static int count = 0;
int compare_and_swap(int* reg, int oldval, int newval)
{
register char result;
#ifdef __i386__
__asm__ volatile ("lock; cmpxchgl %3, %0; setz %1"
: "=m"(*reg), "=q" (result)
: "m" (*reg), "r" (newval), "a" (oldval)
: "memory");
return result;
#elif defined(__x86_64__)
__asm__ volatile ("lock; cmpxchgq %3, %0; setz %1"
: "=m"(*reg), "=q" (result)
: "m" (*reg), "r" (newval), "a" (oldval)
: "memory");
return result;
#else
#error:architecture not supported and gcc too old
#endif
}
void *test_func(void *arg)
{
int i = 0;
for(i = 0; i < 2000; ++i) {
compare_and_swap((int *)&count, count, count + 1);
}
return NULL;
}
int main(int argc, const char *argv[])
{
pthread_t id[10];
int i = 0;
for(i = 0; i < 10; ++i){
pthread_create(&id[i], NULL, test_func, NULL);
}
for(i = 0; i < 10; ++i) {
pthread_join(id[i], NULL);
}
//10*2000=20000
printf("%d\n", count);
return 0;
}
Whent I got the wrong result:
[root#centos-linux-7 workspace]# ./asm
17123
[root#centos-linux-7 workspace]# ./asm
14670
[root#centos-linux-7 workspace]# ./asm
14604
[root#centos-linux-7 workspace]# ./asm
13837
[root#centos-linux-7 workspace]# ./asm
14043
[root#centos-linux-7 workspace]# ./asm
16160
[root#centos-linux-7 workspace]# ./asm
15271
[root#centos-linux-7 workspace]# ./asm
15280
[root#centos-linux-7 workspace]# ./asm
15465
[root#centos-linux-7 workspace]# ./asm
16673
I realize in this line
compare_and_swap((int *)&count, count, count + 1);
count + 1 was wrong!
Then how can I implement the same function as __sync_fetch_and_add. The compare_and_swap function works when the third parameter is constant.
By the way, compare_and_swap function is that right? I just Googled for that, not familiar with assembly.
I got despair with this question.
………………………………………………………………………………………………………………………………………………………………………………………………………………………
after seeing the answer below,I use while and got the right answer,but seems confuse more.
here is the code:
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
static unsigned long count = 0;
int sync_add_and_fetch(int* reg, int oldval, int incre)
{
register char result;
#ifdef __i386__
__asm__ volatile ("lock; cmpxchgl %3, %0; setz %1" : "=m"(*reg), "=q" (result) : "m" (*reg), "r" (oldval + incre), "a" (oldval) : "memory");
return result;
#elif defined(__x86_64__)
__asm__ volatile ("lock; cmpxchgq %3, %0; setz %1" : "=m"(*reg), "=q" (result) : "m" (*reg), "r" (newval + incre), "a" (oldval) : "memory");
return result;
#else
#error:architecture not supported and gcc too old
#endif
}
void *test_func(void *arg)
{
int i=0;
int result = 0;
for(i=0;i<2000;++i)
{
result = 0;
while(0 == result)
{
result = sync_add_and_fetch((int *)&count, count, 1);
}
}
return NULL;
}
int main(int argc, const char *argv[])
{
pthread_t id[10];
int i = 0;
for(i=0;i<10;++i){
pthread_create(&id[i],NULL,test_func,NULL);
}
for(i=0;i<10;++i){
pthread_join(id[i],NULL);
}
//10*2000=20000
printf("%u\n",count);
return 0;
}
the answer goes right to 20000,so i think when you use sync_add_and_fetch function,you should goes with a while loop is stupid,so I write like this:
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
static unsigned long count = 0;
int compare_and_swap(int* reg, int oldval, int incre)
{
register char result;
#ifdef __i386__
__asm__ volatile ("lock; cmpxchgl %3, %0; setz %1" : "=m"(*reg), "=q" (result) : "m" (*reg), "r" (oldval + incre), "a" (oldval) : "memory");
return result;
#elif defined(__x86_64__)
__asm__ volatile ("lock; cmpxchgq %3, %0; setz %1" : "=m"(*reg), "=q" (result) : "m" (*reg), "r" (newval + incre), "a" (oldval) : "memory");
return result;
#else
#error:architecture not supported and gcc too old
#endif
}
void sync_add_and_fetch(int *reg,int oldval,int incre)
{
int ret = 0;
while(0 == ret)
{
ret = compare_and_swap(reg,oldval,incre);
}
}
void *test_func(void *arg)
{
int i=0;
for(i=0;i<2000;++i)
{
sync_add_and_fetch((int *)&count, count, 1);
}
return NULL;
}
int main(int argc, const char *argv[])
{
pthread_t id[10];
int i = 0;
for(i=0;i<10;++i){
pthread_create(&id[i],NULL,test_func,NULL);
}
for(i=0;i<10;++i){
pthread_join(id[i],NULL);
}
//10*2000=20000
printf("%u\n",count);
return 0;
}
but when i run this code with ./asm after g++ -g -o asm asm.cpp -lpthread.the asm just stuck for more than 5min,see top in another terminal:
3861 root 19 0 102m 888 732 S 400 0.0 2:51.06 asm
I just confused,is this code not the same?
The 64-bit compare_and_swap is wrong as it swaps 64 bits but int is only 32 bits.
compare_and_swap should be used in a loop which retries it until is succeeds.
Your result look right to me. lock cmpxchg succeeds most of the time, but will fail if another core beat you to the punch. You're doing 20k attempts to cmpxchg count+1, not 20k atomic increments.
To write __sync_fetch_and_add with inline asm, you'll want to use lock xadd. It's specifically designed to implement fetch-add.
Implementing other operations, like fetch-or or fetch-and, require a CAS retry loop if you actually need the old value. So you could make a version of the function that doesn't return the old value, and is just a sync-and without the fetch, using lock and with a memory destination. (Compiler builtins can make this optimization based on whether the result is needed or not, but an inline asm implementation doesn't get a chance to choose asm based on that information.)
For efficiency, remember that and, or, add and many other instructions can use immediate operands, so a "re"(src) constraint would be appropriate (not "ri" for int64_t on x86-64, because that would allow immediates too large. https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html). But cmpxchg, xadd, and xchg can't use immediates, of course.
I'd suggest looking at compiler output for modern gcc (e.g. on http://godbolt.org/) for functions using the builtin, to see what compilers do.
But beware that inline asm can compile correctly given one set of surrounding code, but not the way you expect given different code. e.g. if the surrounding code copied a value after using CAS on it (probably unlikely), the compiler might decide to give the asm template two different memory operands for "=m"(*reg) and "m"(*reg), but your asm template assumes they will always be the same address.
IDK if gcc4.1 supports it, but "+m"(*reg) would declare a read/write memory operand. Otherwise perhaps you can use a matching constraint to say that the input is in the same location as an earlier operand, like "0"(*reg). But that might only work for registers, not memory, I didn't check.
"a" (oldval) is a bug: cmpxchg writes EAX on failure.
It's not ok to tell the compiler you leave a reg unmodified, and then write an asm template that does modify it. You will get unpredictable behaviour from stepping on the compiler's toes.
See c inline assembly getting "operand size mismatch" when using cmpxchg for a safe inline-asm wrapper for lock cmpxchg. It's written for gcc6 flag-output, so you'll have to back-port that and maybe a few other syntax details to crusty old gcc4.1.
That answer also addresses returning the old value so it doesn't have to be separately loaded.
(Using ancient gcc4.1 sounds like a bad idea to me, especially for writing multi-threaded code. So much room for error from porting working code with __sync builtins to hand-rolled asm. The risks of using a newer compiler, like stable gcc5.5 if not gcc7.4, are different but probably smaller.)
If you're going to rewrite code using __sync builtins, the sane thing would be to rewrite it using C11 stdatomic.h, or GNU C's more modern __atomic builtins that are intended to replace __sync.
The Linux kernel does successfully use inline asm for hand-rolled atomics, though, so it's certainly possible.
If you truly are in such a predicament, I would start with the following header file:
#ifndef SYNC_H
#define SYNC_H
#if defined(__x86_64__) || defined(__i386__)
static inline int sync_val_compare_and_swap_int(int *ptr, int oldval, int newval)
{
__asm__ __volatile__( "lock cmpxchgl %[newval], %[ptr]"
: "+a" (oldval), [ptr] "+m" (*ptr)
: [newval] "r" (newval)
: "memory" );
return oldval;
}
static inline int sync_fetch_and_add_int(int *ptr, int val)
{
__asm__ __volatile__( "lock xaddl %[val], %[ptr]"
: [val] "+r" (val), [ptr] "+m" (*ptr)
:
: "memory" );
return val;
}
static inline int sync_add_and_fetch_int(int *ptr, int val)
{
const int old = val;
__asm__ __volatile__( "lock xaddl %[val], %[ptr]"
: [val] "+r" (val), [ptr] "+m" (*ptr)
:
: "memory" );
return old + val;
}
static inline int sync_fetch_and_sub_int(int *ptr, int val) { return sync_fetch_and_add_int(ptr, -val); }
static inline int sync_sub_and_fetch_int(int *ptr, int val) { return sync_add_and_fetch_int(ptr, -val); }
/* Memory barrier */
static inline void sync_synchronize(void) { __asm__ __volatile__( "mfence" ::: "memory"); }
#else
#error Unsupported architecture.
#endif
#endif /* SYNC_H */
The same extended inline assembly works for both x86 and x86-64. Only the int type is implemented, and you do need to replace possible __sync_synchronize() calls with sync_synchronize(), and each __sync_...() call with sync_..._int().
To test, you can use e.g.
#include <stdlib.h>
#include <pthread.h>
#include <string.h>
#include <errno.h>
#include <stdio.h>
#include "sync.h"
#define THREADS 16
#define PERTHREAD 8000
void *test_func1(void *sumptr)
{
int *const sum = sumptr;
int n = PERTHREAD;
while (n-->0)
sync_add_and_fetch_int(sum, n + 1);
return NULL;
}
void *test_func2(void *sumptr)
{
int *const sum = sumptr;
int n = PERTHREAD;
while (n-->0)
sync_fetch_and_add_int(sum, n + 1);
return NULL;
}
void *test_func3(void *sumptr)
{
int *const sum = sumptr;
int n = PERTHREAD;
int oldval, curval, newval;
while (n-->0) {
curval = *sum;
do {
oldval = curval;
newval = curval + n + 1;
} while ((curval = sync_val_compare_and_swap_int(sum, oldval, newval)) != oldval);
}
return NULL;
}
static void *(*worker[3])(void *) = { test_func1, test_func2, test_func3 };
int main(void)
{
pthread_t thread[THREADS];
pthread_attr_t attrs;
int sum = 0;
int t, result;
pthread_attr_init(&attrs);
pthread_attr_setstacksize(&attrs, 65536);
for (t = 0; t < THREADS; t++) {
result = pthread_create(thread + t, &attrs, worker[t % 3], &sum);
if (result) {
fprintf(stderr, "Failed to create thread %d of %d: %s.\n", t+1, THREADS, strerror(errno));
exit(EXIT_FAILURE);
}
}
pthread_attr_destroy(&attrs);
for (t = 0; t < THREADS; t++)
pthread_join(thread[t], NULL);
t = THREADS * PERTHREAD * (PERTHREAD + 1) / 2;
if (sum == t)
printf("sum = %d (as expected)\n", sum);
else
printf("sum = %d (expected %d)\n", sum, t);
return EXIT_SUCCESS;
}
Unfortunately, I don't have an ancient version of GCC to test, so this has only been tested with GCC 5.4.0 and GCC-4.9.3 for x86 and x86-64 (using -O2) on Linux.
If you find any bugs or issues in the above, please let me know in a comment so I can verify and fix as needed.

a for loop not executed in an operating system C

I have problems while executing a for loop, I created a static table which contains defined values, then I pass my table as an argument in a function to process.
basicly my code looks like the following one:
#define ID_01 0x0000
#define ID_02 0x0001
#define ID_03 0x0002
#define ID_04 0x0003
#define ID_05 0x0004
#define ID_06 0x0005
#define ID_07 0x0006
#define ID_08 0x0007
#define ID_09 0x0008
/*...
*/
#define ID_LAST 0xFFFF
static char table[]={
ID_01, ID_02 ,ID_03, ID_04, .... , ID_LAST}
void process( char *table){
int LastId=0;
char *Command;
for ( Command=table; LastId==0 ; Command++){
switch(Command)
{
case ID_01:
do_stuff01();
break;
case ID_02:
do_stuff02();
break;
...
case ID_LAST:
LastId=1;
break;
default:
break;
}
}
}
I've tried to print some messages to debug but the program does not execute any of the printed even those before the for and after the loop.
But when I've changed my for loop into :
for(i=0;i<10;i++)
all the messages were printed. but I have to process the same way I did in the first place.
PS: this part of code is executed in an operating system task running into a microcontroller and I'm just a beginner.
Now you are using switch (Command) where Command holds address od table variable.
Change switch to
switch (*Command) { //Use value at pointed Command.
}
And note, when doing *Command you dereference char which is 1byte. Your ID's have 2 bytes, therefore you have loss of data.
Change:
static char table[] = {ID_01, ID_02 ,ID_03, ID_04, .... , ID_LAST}
to short to have 16-bits values
static unsigned short table[]={ID_01, ID_02 ,ID_03, ID_04, .... , ID_LAST}
Later, modify your process function to accept unsigned short
void process( const unsigned short *table) { //Unsigned short
int LastId = 0;
unsigned short *Command; //Unsigned short
for ( Command=table; LastId==0 ; Command++){
switch(*Command) { //Added star
//...
}
}
//...
I would rewrite your process code to:
void process(const unsigned short *table, size_t tableLen) {
while (tableLen--) {
switch (*table) {
case ID_1: /* Do stuff */ break;
}
table++; //Increase pointer to next ID element
}
}
//Usage then like this:
static unsigned short table[] = {ID_1, ID_2, ID_3, ..., ID_n};
//Put pointer and length of table
process(table, sizeof(table)/sizeof(table[0]));
In generally, it makes struct and map the ID/FUNC like below.
#include <stdio.h>
#define ID_01 0x0000
#define ID_02 0x0001
/* ... */
#define ID_LAST 0xFFFF
typedef void (*func)();
typedef struct {
char n;
func f;
} fmap;
void do_something01() { }
void do_something02() { }
/* ... */
static fmap fmaps[] = {
{ID_01, do_something01},
{ID_02, do_something02},
/* ... */
{ID_LAST, NULL},
};

Uart Check Receive Buffer interrupt vs. polling

Hello I am learning how to use the Uart by using interrupts in Nios and I am not sure how to start. I have made it in polling, but I am not sure how to start using interrupts.
Any help would be appreciated
Here is my code
#include <stdio.h> // for NULL
#include <sys/alt_irq.h> // for irq support function
#include "system.h" // for QSYS defines
#include "nios_std_types.h" // for standard embedded types
#define JTAG_DATA_REG_OFFSET 0
#define JTAG_CNTRL_REG_OFFSET 1
#define JTAG_UART_WSPACE_MASK 0xFFFF0000
#define JTAG_UART_RV_BIT_MASK 0x00008000
#define JTAG_UART_DATA_MASK 0x000000FF
volatile uint32* uartDataRegPtr = (uint32*)JTAG_UART_0_BASE;
volatile uint32* uartCntrlRegPtr = ((uint32*)JTAG_UART_0_BASE +
JTAG_CNTRL_REG_OFFSET);
void uart_SendByte (uint8 byte);
void uart_SendString (uint8 * msg);
//uint32 uart_checkRecvBuffer (uint8 *byte);
uint32 done = FALSE;
void uart_SendString (uint8 * msg)
{
int i = 0;
while(msg[i] != '\0')
{
uart_SendByte(msg[i]);
i++;
}
} /* uart_SendString */
void uart_SendByte (uint8 byte)
{
uint32 WSPACE_Temp = *uartCntrlRegPtr;
while((WSPACE_Temp & JTAG_UART_WSPACE_MASK) == 0 )
{
WSPACE_Temp = *uartCntrlRegPtr;
}
*uartDataRegPtr = byte;
} /* uart_SendByte */
uint32 uart_checkRecvBuffer (uint8 *byte)
{
uint32 return_value;
uint32 DataReg = *uartDataRegPtr;
*byte = (uint8)(DataReg & JTAG_UART_DATA_MASK);
return_value = DataReg & JTAG_UART_RV_BIT_MASK;
return_value = return_value >> 15;
return return_value;
} /* uart_checkRecvBuffer */
void uart_RecvBufferIsr (void* context)
{
} /* uart_RecvBufferIsr */
int main(void)
{
uint8* test_msg = (uint8*)"This is a test message.\n";
//alt_ic_isr_register ( ); // used for 2nd part when interrupts are enabled
uart_SendString (test_msg);
uart_SendString ((uint8*)"Enter a '.' to exist the program\n\n");
while (!done)
{
uint8 character_from_uart;
if (uart_checkRecvBuffer(&character_from_uart))
{
uart_SendByte(character_from_uart);
}
// do nothing
} /* while */
uart_SendString((uint8*)"\n\nDetected '.'.\n");
uart_SendString((uint8*)"Program existing....\n");
return 0;
} /* main */
I am suppose to use the uart_RecvBufferIsr instead of uart_checkRecvBuffer. How can tackle this situation?
You will need to register your interrupt handler by using alt_ic_isr_register(), which will then be called when an interrupt is raised. Details can be found (including some sample code) in this NIOS II PDF document from Altera.
As far as modifying your code to use the interrupt, here is what I would do:
Remove uart_checkRecvBuffer();
Change uart_RecvBufferIsr() to something like (sorry no compiler here so can't check syntax/functioning):
volatile uint32 recv_flag = 0;
volatile uint8 recv_char;
void uart_RecvBufferIsr(void *context)
{
uint32 DataReg = *uartDataRegPtr;
recv_char = (uint8)(DataReg & JTAG_UART_DATA_MASK);
recv_flag = (DataReg & JTAG_UART_RV_BIT_MASK) >> 15;
}
The moral of the story with the code above is that you should keep your interrupts as short as possible and let anything that is not strictly necessary to be done outside (perhaps by simplifying the logic I used with the recv_char and recv_flag).
And then change your loop to something like:
while (!done)
{
if (recv_flag)
{
uart_SendByte(recv_byte);
recv_flag = 0;
}
}
Note that there could be issues with what I've done depending on the speed of your port - if characters are received too quickly for the "while" loop above to process them, you would be losing some characters.
Finally, note that I declared some variables as "volatile" to prevent the compiler from keeping them in registers for example in the while loop.
But hopefully this will get you going.

Resources