Watchdog timeout is too short

Watchdog timeout is too short - c

I have a conceptual question, I'm currently working on a project that have to implement a watchdog timer to ensure that the code works properly, I'm using a STM32F4, from the datasheet I can see that the max timeout allow by the IWDG (independent Watchdog) is 32768 ms, I'm using a SIM800L for communication via GPRS, so some communications take longer than that, during this process the UC is busy waiting for the answers, so it cannot reset the IWDG, so I was thinking on deactivating the Watchdog in those parts, or implement my own watchdog whit a timer and a simple reset function so can make longer timeout periods.
My question is:
Is this a sign of a flaw on my code design? Should I instead adapt my code to reset the IWDG every 30 seconds or so and never deactivate it? Is implementing my own WDG with a timer bad practice?

¿is this a sign of a flaw on my code design?,
¿should instead adapt my
code to renew the IWDG every 30 seconds or so?
No, you simply need to write the key register or load a new value to the downcounter before the downcounter reaches zero. It shows the watchdog that your software is alive and no reset is needed.
during this process the UC is busy waiting for the answers, so it
cannot reset the IWDG
This means that your implementation is bad. You need to implement it non-blocking way. It is not dificult.
¿implementing my own
WDG whit a timer is a bad practice?
It is a very bad idea. What will happen if your program hardfault? Your own watchdog will be useless. Hardware WDG is also clocked from its one clock source - so if your program does something wrong with the clocks - it will still work.

Programs should never deactivate the watchdog in run-time, as that defeats the purpose of having a watchdog in the first place. Many watchdog hardware peripherals don't even allow you to disable it once enabled.
You cannot implement your own watchdog using timers, because the watchdog hardware is explicitly using a different timer than what's available to the application programmer. So if your program halts for whatever reason, your timer solution will halt as well. Forget about implementing watchdogs using on-chip timers or software. You can only implement your own watchdog using a external hardware, such as a binary counter IC or monostable multivibratior IC.
Is this a sign of a flaw on my code design?
It is - you should not busy-wait for external resources to become available. Rather than
while(some_serial_bus == BUSY) {} // bad, busy wait
you should be doing:
for(;;)
{
kick_wdog();
if(some_serial_bus != BUSY) // good, polling
{
do_stuff();
}
}
When implementing the driver for the external serial bus you should provide a method to check if data is available, then allow the caller to decide whether to busy wait for that function or not. An ideal, properly written driver should never contain any busy waits nor should it contain any "sleep/delay" calls.

I don't think you can stop the IWDG once it starts (nor would you want to). I'm not familiar with the SIM800L, but your best bet would be to find a way to kick the watchdog intermittently while GPRS is operating. You want to do this in firmware, not hardware. (Don't use a HW timer to kick the WDT because if your SW crashes, the HW timer could keep doing its thing.) Alternatively, the STM32F4 also as a window watchdog (WWDG) timer you could use. You might be able to configure longer window times with the WWDG.

Related

Making driver library for a slow module, watchdog friendly

Context
I'm making some libraries to manage internet protocol trough GPRS, some part of this communications (made trough UART) are rather slow (some can take more than 30 seconds) because the module has to connect through GPRS.
First I made a driver library to control the module and manage TCP/IP connections, this library worked whit blocking functions, for example a function like Init_GPRS_connection() could take several seconds to end, I have been made to notice that this is bad practice, cause now I have to implement a watchdog timer and this kind of function is not friendly whit short timeout like watchdogs have (I cannot kick the timer before it expire)
What have I though
I need to rewrite part of my libraries to be watchdog friendly, for this purpose I have tough in this scheme, I need functions that have state machine inside, those will be pulling data acquired trough UART interruptions to advance trough the state machines, so then I can write code like:
GPRS_typef Init_GPRS_connection(){
switch(state){ //state would be a global functions that take the current state of the state machine
.... //here would be all the states of the state machine
case end:
state = 0;
return Done;
}
}
while(Init_GPRS_connection() != Done){
Do_stuff(); //Like kick the Watchdog
}
But I see a few problems whit this solution:
This is a less user-friendly implementation, the user should be careful using this library driver because extra lines of code would be always necessary (kind of defeating the purpose of using functions).
If, for some reason, the module wouldn't answer at some point the code would get stuck in the state machine because the watchdog would be kicked outside this function even though the code got stuck in a loop, this kind of defeat the purpose of using watchdog Timer's
My question
What kind of implementation should I use to make a user and watchdog friendly driver library?, how does other drivers library manage this?
Extra information
All this in the context of embedded systems
I would like to implement the watchdog kicking action outside the driver's functions

Given where you are and assuming you do not what too much upheaval to your project to "do it properly", what you might to is add variable watchdog timeout extension, such that you set a counter that is decremented in a timer interrupt and if the counter is not zero, the watch dog is reset.
That way you are not allowing the timer interrupt to reset the watchdog indefinitely while your main thread is stuck, but you can extend the watchdog immediately before executing any blocking code, essentially setting a timeout for that operation.
So you might have (pseudocode):
static volatile uint8_t wdg_repeat_count = 0 ;
void extendWatchdog( uint8_t repeat ) { wdg_repeat_count = repeat ; }
void timerISR( void )
{
if( wdg_repeat_count > 0 )
{
resetWatchdog() ;
wdg_repeat_count-- ;
}
}
Then you can either:
extendWatchdog( CONNECTION_INIT_WDG_TIMEOUT ) ;
while(Init_GPRS_connection() != Done){
Do_stuff(); //Like kick the Watchdog
}
or continue to use your existing non-state-machine based solution:
extendWatchdog( CONNECTION_INIT_WDG_TIMEOUT ) ;
bool connected = Init_GPRS_connection() ;
if( connected ) ...
The idea is compatible with both what you have and what you propose, it simply allows you to extend the watchdog timeout beyond that dictated by the hardware.
I suggest a uint8_t, because it prevents a lazy developer simply setting a large value and effectively disabling the watchdog protection, and it is likely to be atomic and so shareable between the main and interrupt context.
All that said, it would clearly have been better to design in your integrity infrastructure from the outset at the architectural level rather than trying to bolt it on after the event. For example if you were using an RTOS, you might reset the watchdog in a low priority task that if starved, would cause a watchdog expiry, and that "watchdog task" could be use to monitor the other tasks to ensure they are scheduling as expected.
Without an RTOS you might have a "big-loop" architecture with each "task" implemented as a state-machine. In your example you seem to have missed the point of a state-machine. "initialising connection" should be a single state of a high level state-machine, the internals of that state may itself be a state-machine (hierarchical state machines). So your entire system would be a single master state-machine in the main loop, and the watchdog reset once at each loop iteration. Nothing in any sub-state should block to ensure the loop time is low and deterministic. That is how for example Arduino framework's loop() function should work (when done properly - unfortunately seldom the case in examples). To understand how to implement a real-time deterministic state-machine architecture you couls do worse that look at the work of Miro Samek. The framework described therein is available via his company.

You should make your library non-blocking, but other than that, you should not worry about the watchdog at all. The watchdog management should be left to the user.
To allow the user to do other work while your library is waiting, you can use these approaches:
Provide a function to feed the data into your library (e.g. receive()). The user should call this function when the data is available, for example from the interrupt. As this function can be called from the interrupt, make sure it does not do heavy processing. Typically, you would just buffer the data and process it later (Step 2).
Provide a function, that user calls periodically, that updates the state of your library and does any other housekeeping tasks (like timeout detection). Typically, this function is called run(), process(), tick() or something along these lines. The user would call this function in their main loop or from a dedicated RTOS task.
Provide a way to tell the user the state of your library. You can do it either by some sort of getState() function or using a callback or both. Based on this information, the user can implement their own state machine to do things on connect, disconnect etc.

How to Disable/Delay the watchDog Timer for a certain Task in an embedded system

I'm working on a project for automotive system where we use the MPC5748 MCU. The application uses an RTOS based on AUTOSAR OS, and this MPC target support two type of watchdogs; software and hardware (they have used soft WDT).
My mission is to fit an algorithm within this application, the development of the algorithm has been done, the problem is that in the task where the algorithm is running is a 1ms task and the algorithm needs much more time than the time dedicated to this function.
I'm a newbie to the embedded world.By the way, in the algorithm main function the program will reset itself and this seems to be a timeOut generated by the expiration of the watchdog.
My questions are:
Can I disable the watchdog timer for this specified function (which must not be disabled but just for testing purpose)? It is possible to use more timeOut for the watchdog on that specified function?
Must I develop another task with a big delay in other to run the algorithm? But the problem is that the algorithm need to be synchronised with the 1ms task since we are receiving CAN commands.
Can i add a sleep(<1ms) on the desired function in order to wait a little bit witout affecting other tasks
What are other options to try?
NB: This is a general problem on the watchdog timer and any useful informations will be much helpful for me. Sorry because I can't share the code.

Can I disable the watchdog timer for this specified function (which must not be disabled but just for testing purpose)? It is possible to use more timeOut for the watchdog on that specified function?
Let's forget that one - it is a really bad idea. If it is possible to defeat the watchdog, then it is possible to do it by error, and then the whole point of the watchdog is defeated. Apart from that its an XY question - a question about your proposed solution to a different problem - you should ask about the problem directly.
Must I develop another task with a big delay in other to run the algorithm? But the problem is that the algorithm need to be synchronised with the 1ms task since we are receiving CAN commands.
Yes you need another task, but you should not add a "big delay" and it is probably unnecessary and certainly a bad design. If the 1ms task needs the result of the algorithm then, the algorithm should run in a service task triggered by the 1ms task and run asynchronously to the 1ms task, the service task then makes the results available to the 1ms task when available (by shared memory or message passing perhaps). Alternatively if the result is not specifically needed by the 1ms task, the service task could take the necessary action independently of the 1ms task.
There are many options, but essentially it seems that your task partitioning is inappropriate; your CAN Rx task should be responsible for receiving CAN messages only, and any action required in response to CAN messages deferred to one or more other tasks, perhaps fed from a message queue.
What are other options to try ?
Software design should not be a matter of trial and error - get the design right, implement the design. However you might consider whether 1ms is appropriate; is it possible that the period can be extended to encompass the worst case execution time without causing a failure to meet deadlines in general? If the answer is "no" then the algorithm does not belong in this task.

I don't think so you can disable/delay the WATCHDOG timer and even if you could that's not a good option to go for.
The problem what think is that the task you are calling is of 1ms, which is very less to read CAN messages and then operate on the same. The minimum task time i think should be of 5ms and the optimal time should be of 10ms.

Can I disable the watchdog timer for this specified function (which must not be disabled but just for testing purpose)? It is possible to use more timeOut for the watchdog on that specified function?
You should never disable the watchdog anywhere in your code.
It might not even be possible, on the MPC5x families you typically set up the watchdog once, and then for safety reasons all watchdog registers turn to read-only registers.
Must I develop another task with a big delay in other to run the algorithm? But the problem is that the algorithm need to be synchronised with the 1ms task since we are receiving CAN commands.
Ideally you should only service the watchdog from one single location in the program. Your CAN peripheral will be FlexCAN, which has a lot of available "mailboxes" for CAN messages. In most cases, you shouldn't need to poll it, but a flag will be set when the desired message arrive.
So it isn't obvious to me why you would need a delay to wait for them. Simply do:
void the_task (void)
{
wdog_refresh();
... // do other things
if(can_message_available)
{
// do something with the message
}
... // do other things
}
rather than
// BAD:
while(!can_message_available)
; // do nothing
Even if you need to use the CAN as FIFO and poll it repeatedly, you would still use the same approach. You'd just have to ensure that the task runs often enough that there will never be an overflow in the FIFO buffer.

Setting up watchdog_set_period to max value causes reboot

I don't much about how watchdog timer works in embedded environment and I am facing issue related to watchdog timer
Maximum time out value defined in one of the macro is 55 and when we try to set up this value from watchdog_set_period function ,our board is getting reboot
#define Max_time_out 55
watchdog_set_period(int period) // Set watchdogs timeout counter
where period = 55
Now is it something expected or how what is the reason for reboot
We are writing this period value to some driver which we are accessing through file descriptor.

The link states this description on watchdog timers.
A watchdog timer is a piece of hardware that can be used to automatically detect software anomalies and reset the processor if any occur. Generally speaking, a watchdog timer is based on a counter that counts down from some initial value to zero. The embedded software selects the counter's initial value and periodically restarts it. If the counter ever reaches zero before the software restarts it, the software is presumed to be malfunctioning and the processor's reset signal is asserted. The processor (and the embedded software it's running) will be restarted as if a human operator had cycled the power.
You haven't posted the code so we can't judge what exactly is the problem. If you have written the code check if your code is causing any problems which is causing the watch dog timer to reset.

A watchdog timer is a special kind of timer usually found on embedded systems that is used to detect when the running software/firmware is hung up on some task. The watchdog timer is basically a countdown timer that counts from some initial value down to zero.
When zero is reached, the watchdog timer understands that the system is hung up and resets it.
Therefore, the running software must periodically update the watchdog timer(in a infinite while loop) with a new value to stop it from reaching zero and causing a reset. When the running software is locked up doing a certain task and cannot update(refresh fails) the watchdog timer, the timer will eventually reach zero and a reset/reboot will occur.
So in summary, if you enable watchdog timer then you need to periodically refresh watchdog timer. Otherwise the board reboots when the watchdog timer expires.

Watchdog configuration on Stellaris Launchpad LM4F120

I try to configure the watchdog timer on Stellaris Launchpad LM4F120.
The code is the following:
void configure_watchdog(void) {
SYSCTL_RCGCWD_R = 0x1; /* Enabling Clock for WD0 */
WATCHDOG0_LOAD_R = 0xffffffff; /* Setting initial value */
WATCHDOG0_CTL_R = WDT_CTL_INTEN; /* Enabling interrupt generation */
}
This supposed to be enough in accordance to the datasheet.
The problem is that controller always falls to FaultISR and resets after it. I can't understand why.
What am I doing wrong?
EDIT: The controller does not reset. It just goes to FaultISR

Jumping to an ISR when the watchdog expires sounds like the correct behavior. What exactly are you doing inside your ISR code? If you are resetting the watchdog inside the ISR, then you shouldn't be seeing the microcontroller reset itself (based on your posted configuration code, at least). After you set up the watchdog, read the configuration register back out and make sure that it holds the value that you expect. Some of the bits in that register can only be set under certain circumstances, and it's possible that you're not running with the settings that you think you're using.
You mentioned that you were trying to use the watchdog timer as a generic downcounter. Could you use one of the general-purpose timers instead of the watchdog? You would still get an interrupt when time expired, but regular timers don't have the ability to reset the entire system.

You have to keep servicing the watchdog, otherwise it times out and calls whatever is setup for that exception. FaultISR would appear to be that in your case.
If you want the watchdog to do something else on the timeout you need to figure out how your particular toolchain connects functions to exception sources and map your new function correctly.
If you don't want the watchdog to expire (which is usually what it's there for, to catch errant code) then you need to service it regularly. The compiler vendor often provides a function or intrinsic to do this.

How do you test your interrupt handling module?

I've got an interrupt handling module which controls the interrupt controller hardware on an embedded processor. Now I want to add more tests to it. Currently, the tests only tests if nesting of interrupts works by making two software interrupts from within an ISR, one with low priority and one with high priority. How can I test this module further?

I suggest that you try to create other stimuli as well.
Often, also hardware interrupts can be triggered by software (automatic testing) or the debugger by setting a flag. Or as an interrupt via I/O. Or a timer interrupt. Or you can just set the interrupt bit in an interrupt controller via the debugger while you are single stepping.
You can add some runtime checks on things which are not supposed to happen. Sometimes I elect to set output pins to monitor externally (nice if you have an oscilloscope or logic analyser...)
low_prio_isr(void)
{
LOW_PRIO_ISR=1;
if (1 == HIGH_PRIO_ISR)
{ this may never happen. dummy statement to allow breakpoint in debugger }
}
high_prio_isr(void)
{
HIGH_PRIO_ISR=1
}
The disadvantage of the software interrupt is that the moment is fixed; always the same instruction. I believe you would like to see evidence that it always works; deadlock free.
For interrupt service routines I find code reviews very valuable. In the end you can only test the situations you've imagined and at some point the effort of testing will be very high. ISRs are notoriously difficult to debug.
I think it is useful to provide tests for the following:
- isr is not interrupted for lower priority interrupt
- isr is not interrupted for same priority interrupt
- isr is interrupted for higher priority interrupt
- maximum nesting count within stack limitations.
Some of your tests may stay in the code as instrumentation (so you can monitor for instance maximum nesting level.
Oh, and one more thing: I've generally managed to keep ISRs so short that I can refrain from nesting.... if you can this will gain you additional simplicity and more performance.
[EDIT]
Of course, ISRs need to be tested on hardware in system too. Apart from the bit-by-bit, step-by-step approach you may want to prove:
- stability of system at maximum interrupt load (preferably several times the predicted maximum load; if your 115kbps serial driver can also handle 2MBps you'll be ok!)
- correct moment of enabling / disabling isr, especially if system also enters a sleep mode
- # of interrupts. Can be surprising if you add mechanical switches, mechanical rotary (hundreds of break/contact moments before reaching steady situation)

I recommend real-hardware testing. Interrupt handling is inherently random and unpredictable.
Use a signal generator and feed a square wave into the appropriate interrupt pin. Use multiple generators (or one with multiple outputs) to test multiple IRQ lines and verify priority handling.
Experiment with dialing the frequency up & down on the signal generators (vary the rates between them), and see what happens. Have lots of diagnostic code to verify the state of the interrupt controller in the various states.
Alternative: If your platform has timers that can trigger interrupts, you can use them instead of external hardware.

I'm not an embedded developer, so I don't know if this is possible, but how about decoupling the code that handles the interrupts from the callback-registration mechanism? This would allow you to write simulator code fireing interrupt-events as you like it...

For stuff like this I highly recommend something like the SPIN model checker. You wind up testing the algorithm, not the code, but the testing is exhaustive. Back in the day, I found a bug in gdb using this technique.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight