Have you any idea what may change that value. It's code run on STM32. There are interrupts but it's almost impossible to interrupt between value initialization and the line after if statement.
My first idea is that this value is written at illegal part of memory which is used by some register.
I'm compiling the code with optimization O1 and only this function has ioptimization O0 to make analysis easier. The soft also crashes in run mode so it's not problem with debugging.
Change of value lead to overflow after few lines and crash the system. The situation repeats every time.
enter image description here
I don't have any idea. I've only checked if the code is correct, decralation of function, place where it is used.
#pragma GCC push_options
#pragma GCC optimize ("O0")
MonitoringParseMessStatus monitoring_ack_message(char *msg, uint16_t length)
{
MonitoringParseMessStatus res = M_BAD_MESS;
uint16_t single_ack_message_length = 18; // minimum len
if(length < single_ack_message_length)
return M_NAK;
uint8_t len_mess = strlen(msg);
char single_message[len_mess + 1];
while(length >= len_mess && length >= single_ack_message_length)
{
memset(&single_message[0], 0, sizeof(single_message));
memcpy(&single_message[0], msg, single_ack_message_length);
msg += len_mess + 1;
length -= len_mess;
len_mess = strlen(msg);
char * ack_ptr = strstr(single_message, "\"ACK\"");
if(ack_ptr == NULL)
{
ack_ptr = strstr(single_message, "\"NAK\"");
if(ack_ptr != NULL)
{
return M_NAK;
}
res = M_BAD_MESS;
continue;
}
else
res = M_ACK;
char *seq_ptr = ack_ptr + strlen("\"ACK\"");
int seq = atoi(seq_ptr);
for(int i = 0; i < QUEUE_SIZE; i++)
{
if(monitoring_queue[i].set == false)
continue;
if(monitoring_queue[i].sequence != seq)
continue;
monitoring_queue[i].set = false;
monitoring_connected_set(monitoring_queue[i].monitoring_num, true);
monitoring_send_earliest_event();
break;
}
}
return res;
}
#pragma GCC pop_options
Optimization might trip the debugger over. So for errors like this you need to first of all debug at the assembler level, to ensure that the assignment of the variable has indeed happened at the line you placed the breakpoint. Decent debuggers have an option to single step the machine code inlined with the C code.
Note: some expression simplification might occur even when optimization is disabled (-O0). Variables may still be placed in registers etc.
Other than that, local variables mysteriously changing value is often caused by stack overflow. Check the SP in your debugger when you are on this line.
Related
I am using a simple software queue based on a write index and a read index.
Introduction details; Language: C, Compiler: GCC Optimization: -O3 with extra parameters, Architecture: Armv7a, CPU: Multicore, 2 Cortex A-15, L2 Cache: Shared and enabled, L1 Cache: Every CPU, enabled, Architecture is supposed to be cache coherent.
CPU 1 does the writing stuff and CPU 2 does the reading stuff. Below is the very simplified example code. You can assume the initial values of the indexes are zero.
COMMON:
#define QUE_LEN 4
unsigned int my_que_write_index = 0; //memory
unsigned int my_que_read_index = 0; //memory
struct my_que_struct{
unsigned int param1;
unsigned int param2;
};
struct my_que_struct my_que[QUE_LEN]; //memory
CPU 1 runs:
void que_writer
{
unsigned int write_index_local;
write_index_local = my_que_write_index; //my_que_write_index is in memory
my_que[write_index_local].param1 = 16; //my_que is my queue and stored in memory also
my_que[write_index_local].param2 = 32;
//similar writing stuff
++write_index_local;
if(write_index_local == QUE_LEN) write_index_local = 0;
my_que_write_index = write_index_local;
}
CPU 2 runs:
void que_reader()
{
unsigned int read_index_local, param1, param2;
read_index_local = my_que_read_index; //also in memory
while(read_index_local != my_que_write_index)
{
param1 = my_que[read_index_local].param1;
if(param1 == 0) FATAL_ERROR;
param2 = my_que[read_index_local].param2;
//similar reading stuff
my_que[read_index_local].param1 = 0;
++read_index_local;
if(read_index_local == QUE_LEN) read_index_local = 0;
}
my_que_read_index = read_index_local;
}
Okay, in a normal case, fatal error should never occur because param1 of the queue is always stored with a constant value of 16. But somehow param1 of the queue is happening 0 and fatal error occurs.
It is clear that this is somehow a race condition problem, but I can't figure how it is happening. Indexes are updated seperately by the CPUs.
I don't want to fill my code with memory barriers without understanding the core of the problem. Do you have any idea how this is happening?
Details: This is a baremetal system, these codes are interrupt-disabled, and there is no preemption or task switching.
The compiler and the CPU are allowed to rearrange stores and loads as they see fit (i.e. as long as a single threaded program would not be able to observe a difference). Of course for multi-threaded programs these effects are observable quite well.
For example, this code
write_index_local = my_que_write_index;
my_que[write_index_local].param1 = 16;
my_que[write_index_local].param2 = 32;
++write_index_local;
if(write_index_local == QUE_LEN) write_index_local = 0;
my_que_write_index = write_index_local;
could be reordered like this
a = my_que_write_index;
my_que_write_index = write_index_local == QUE_LEN - 1 ? 0 : a + 1;
my_que[a].param1 = 16;
my_que[a].param2 = 32;
Getting this stuff right requires atomics and barriers that avoid these kinds of reorderings. Check out Preshing's excellent series of blog posts to learn about atomics, this one is probably a good start: http://preshing.com/20120612/an-introduction-to-lock-free-programming/ but check out the following ones as well.
Context
Debian 64.
Core 2 duo.
Fiddling with a loop. I came with different variations of the same loop but I would like to avoid conditional branching if possible.
But, even if I think it will be difficult to beat.
I thought about SSE or bit shifting but still, it would require a jump (look at the computed goto below). Spoiler : a computed jump doesn't seems to be the way to go.
The code is compiled without PGO. Because on this piece of code, it makes the code slower..
flags :
gcc -march=native -O3 -std=c11 test_comp.c
Unrolling the loop didn't help here..
63 in ascii is '?'.
The printf is here to force the code to execute. Nothing more.
My need :
A logic to avoid the condition. I assume this as a challenge to make my holydays :)
The code :
Test with the sentence. The character '?' is guaranteed to be there but at a random position.
hjkjhqsjhdjshnbcvvyzayuazeioufdhkjbvcxmlkdqijebdvyxjgqddsyduge?iorfe
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char **argv){
/* This is quite slow. Average actually.
Executes in 369,041 cycles here (cachegrind) */
for (int x = 0; x < 100; ++x){
if (argv[1][x] == 63){
printf("%d\n",x);
break;
}
}
/* This is the slowest.
Executes in 370,385 cycles here (cachegrind) */
register unsigned int i = 0;
static void * restrict table[] = {&&keep,&&end};
keep:
++i;
goto *table[(argv[1][i-1] == 63)];
end:
printf("i = %d",i-1);
/* This is slower. Because of the calculation..
Executes in 369,109 cycles here (cachegrind) */
for (int x = 100; ; --x){
if (argv[1][100 - x ] == 63){printf("%d\n",100-x);break;}
}
return 0;
}
Question
Is there a way to make it faster, avoiding the branch maybe ?
The branch miss is huge with 11.3% (cachegrind with --branch-sim=yes).
I cannot think it is the best one can achieve.
If some of you manage assembly with enough talent, please come in.
Assuming you have a buffer of well know size being able to hold the maximum amount of chars to test against, like
char buffer[100];
make it one byte larger
char buffer[100 + 1];
then fill it with the sequence to test against
read(fileno(stdin), buffer, 100);
and put your test-char '?' at the very end
buffer[100] = '?';
This allows you for a loop with only one test condition:
size_t i = 0;
while ('?' != buffer[i])
{
++i;
}
if (100 == i)
{
/* test failed */
}
else
{
/* test passed for i */
}
All other optimisation leave to the compiler.
However I couldn't resist, so here's a possible approach to do micro optimisation
char buffer[100 + 1];
read(fileno(stdin), buffer, 100);
buffer[100] = '?';
char * p = buffer;
while ('?' != *p)
{
++p;
}
if ((p - buffer) == 100)
{
/* test failed */
}
else
{
/* test passed for (p - buffer) */
}
I'm developing on an AD Blackfin BF537 DSP running uClinux. I have a total of 32MB SD-RAM available. I have an ADC attached, which I can access using a simple, blocking call to read().
The most interesting part of my code is below. Running the program seems to work just fine, I get a nice data package that I can fetch from the SD-card and plot. However, if I comment out the float calculation part (as noted in the code), I get only zeroes in the ft_all.raw file. The same occurs if I change optimization level from -O3 to -O0.
I've tried countless combinations of all sorts of things, and sometimes it works, sometimes it does not - earlier (with minor modifications to below), the code would only work when optimization was disabled. It may also break if I add something else further down in the file.
My suspicion is that the data transferred by the read()-function may not have been transferred fully (is that possible, even though it returns the correct number of bytes?). This is also the first time I initialize pointers using direct memory adresses, and I have no idea how the compiler reacts to this - perhaps I missed something, here?
I've spent days on this issue now, and I'm getting desperate - I would really appreciate some help on this one! Thanks in advance.
// Clear the top 16M memory for data processing
memset((int *)0x01000000,0x0000,(size_t)SIZE_16M);
/* Prep some pointers for data processing */
int16_t *buffer;
int16_t *buf16I, *buf16Q;
buffer = (int16_t *)(0x1000000);
buf16I = (int16_t *)(0x1600000);
buf16Q = (int16_t *)(0x1680000);
/* Read data from ADC */
int rbytes = read(Sportfd, (int16_t*)buffer, 0x200000);
if (rbytes != 0x200000) {
printf("could not sample data! %X\n",rbytes);
goto end;
} else {
printf("Read %X bytes\n",rbytes);
}
FILE *outfd;
int wbytes;
/* Commenting this region results in all zeroes in ft_all.raw */
float a,b;
int c;
b = 0;
for (c = 0; c < 1000; c++) {
a = c;
b = b+pow(a,3);
}
printf("b is %.2f\n",b);
/* Only 12 LSBs of each 32-bit word is actual data.
* First 20 bits of nothing, then 12 bits I, then 20 bits
* nothing, then 12 bits Q, etc...
* Below, the I and Q parts are scaled with a factor of 16
* and extracted to buf16I and buf16Q.
* */
int32_t *buf32;
buf32 = (int32_t *)buffer;
uint32_t i = 0;
uint32_t n = 0;
while (n < 0x80000) {
buf16I[i] = buf32[n] << 4;
n++;
buf16Q[i] = buf32[n] << 4;
i++;
n++;
}
printf("Saving to /mnt/sd/d/ft_all.raw...");
outfd = fopen("/mnt/sd/d/ft_all.raw", "w+");
if (outfd == NULL) {
printf("Could not open file.\n");
}
wbytes = fwrite((int*)0x1600000, 1, 0x100000, outfd);
fclose(outfd);
if (wbytes < 0x100000) {
printf("wbytes not correct (= %d) \n", (int)wbytes);
}
printf(" done.\n");
Edit: The code seems to work perfectly well if I use read() to read data from a simple file rather than the ADC. This leads me to believe that the rather hacky-looking code when extracting the I and Q parts of the input is working as intended. Inspecting the assembly generated by the compiler confirms this.
I'm trying to get in touch with the developer of the ADC driver to see if he has an explanation of this behaviour.
The ADC is connected through a SPORT, and is opened as such:
sportfd = open("/dev/sport1", O_RDWR);
ioctl(sportfd, SPORT_IOC_CONFIG, spconf);
And here are the options used when configuring the SPORT:
spconf->int_clk = 1;
spconf->word_len = 32;
spconf->serial_clk = SPORT_CLK;
spconf->fsync_clk = SPORT_CLK/34;
spconf->fsync = 1;
spconf->late_fsync = 1;
spconf->act_low = 1;
spconf->dma_enabled = 1;
spconf->tckfe = 0;
spconf->rckfe = 1;
spconf->txse = 0;
spconf->rxse = 1;
A bfin_sport.h file from Analog Devices is also included: https://gist.github.com/tausen/5516954
Update
After a long night of debugging with the previous developer on the project, it turned out the issue was not related to the code shown above at all. As Chris suggested, it was indeed an issue with the SPORT driver and the ADC configuration.
While debugging, this error messaged appeared whenever the data was "broken": bfin_sport: sport ffc00900 status error: TUVF. While this doesn't make much sense in the application, it was clear from printing the data, that something was out of sync: the data in buffer was on the form 0x12000000,0x34000000,... rather than 0x00000012,0x00000034,... whenever the status error was shown. It seems clear then, why buf16I and buf16Q only contained zeroes (since I am extracting the 12 LSBs).
Putting in a few calls to usleep() between stages of ADC initialization and configuration seems to have fixed the issue - I'm hoping it stays that way!
I'm new at multi-threaded programming and I tried to code the Bakery Lock Algorithm in C.
Here is the code:
int number[N]; // N is the number of threads
int choosing[N];
void lock(int id) {
choosing[id] = 1;
number[id] = max(number, N) + 1;
choosing[id] = 0;
for (int j = 0; j < N; j++)
{
if (j == id)
continue;
while (1)
if (choosing[j] == 0)
break;
while (1)
{
if (number[j] == 0)
break;
if (number[j] > number[id]
|| (number[j] == number[id] && j > id))
break;
}
}
}
void unlock(int id) {
number[id] = 0;
}
Then I run the following example. I run 100 threads and each thread runs the following code:
for (i = 0; i < 10; ++i) {
lock(id);
counter++;
unlock(id);
}
After all threads have been executed, the result of the shared counter is 10 * 100 = 1000 which is the expected value. I executed my program multiple times and the result was always 1000. So it seems that the implementation of the lock is correct. That seemed weird based on a previous question I had because I didn't use any memory barriers/fences. Was I just lucky?
Then I wanted to create a multi-threaded program that will use many different locks. So I created this (full code can be found here):
typedef struct {
int number[N];
int choosing[N];
} LOCK;
and the code changes to:
void lock(LOCK l, int id)
{
l.choosing[id] = 1;
l.number[id] = max(l.number, N) + 1;
l.choosing[id] = 0;
...
Now when executing my program, sometimes I get 997, sometimes 998, sometimes 1000. So the lock algorithm isn't correct.
What am I doing wrong? What can I do in order to fix it?
Is it perhaps a problem now that I'm reading arrays number and choosing from a struct
and that's not atomic or something?
Should I use memory fences and if so at which points (I tried using asm("mfence") in various points of my code, but it didn't help)?
With pthreads, the standard states that accessing a varable in one thread while another thread is, or might be, modifying it is undefined behavior. Your code does this all over the place. For example:
while (1)
if (choosing[j] == 0)
break;
This code accesses choosing[j] over and over while waiting for another thread to modify it. The compiler is entirely free to modify this code as follows:
int cj=choosing[j];
while(1)
if(cj == 0)
break;
Why? Because the standard is clear that another thread may not modify the variable while this thread may be accessing it, so the value can be assumed to stay the same. But clearly, that won't work.
It can also do this:
while(1)
{
int cj=choosing[j];
if(cj==0) break;
choosing[j]=cj;
}
Same logic. It is perfectly legal for the compiler to write back a variable whether it has been modified or not, so long as it does so at a time when the code could be accessing the variable. (Because, at that time, it's not legal for another thread to modify it, so the value must be the same and the write is harmless. In some cases, the write really is an optimization and real-world code has been broken by such writebacks.)
If you want to write your own synchronization functions, you have to build them with primitive functions that have the appropriate atomicity and memory visibility semantics. You must follow the rules or your code will fail, and fail horribly and unpredictably.
Please refer to my code below. When optimization in IAR MSP430 compiler is set high, I am having the following issue. Code works fine when optimization is low.
Issue: If the condition statement at (B) returns false, statement (A) is executed instead of statement (C).
int16_t cpu_flash_read_setting (void * setting, const uint8_t offset, const uint8_t num_of_bytes)
{
int16_t returnable_status = PASS;
uint16_t flash_copy_one_address = FLASH_INFO_SEG_C_ADDR + offset;
uint16_t flash_copy_two_address = FLASH_INFO_SEG_D_ADDR + offset;
if (0U == (num_of_bytes % sizeof(uint16_t)))
{
uint16_t *setting_copy_one = (uint16_t *) flash_copy_one_address;
uint16_t *setting_copy_two = (uint16_t *) flash_copy_two_address;
if (*setting_copy_one == *setting_copy_two)
{
setting = setting_copy_one;
}
else
{
(A) returnable_status = FAIL;
}
}
else if (0U == (num_of_bytes % sizeof(uint8_t)))
{
uint8_t *setting_copy_one = (uint8_t *) flash_copy_one_address;
uint8_t *setting_copy_two = (uint8_t *) flash_copy_two_address;
(B) if (*setting_copy_one == *setting_copy_two)
{
setting = setting_copy_one;
}
else
{
(C) returnable_status = FAIL;
}
}
else
{
/* No Action */
}
return returnable_status;
}
That looks entirely reasonable to me. When you have optimisation turned up high, the compiler can and usually will re-order statements wildly. Your two main clauses are identical apart from their typing - so it's entirely plausible for the compiler to merge the execution paths and have them differ only where it actually matters.
This is only a problem if the actual observable effect differs from what was intended.
In any event, optimised code is always difficult to follow with a debugger, precisely because of the re-ordering effects.
By the way, if your code is talking to actual hardware you may want to declare the flash_copy_*_address variables as volatile. This is a hint to the compiler that the memory they point to doesn't necessarily behave in the normal way, and forces it be more conservative with its optimisations.
The two lines of code A and C are identical, and the execution paths merge after those two lines (the next line to be executed in both cases is return returnable_status;.
Thus the compiler is performing tail-merging optimisation, using the same block of assembly code for both source code lines. This optimisation is expected and perfectly valid, and should not cause a problem.