I'm accessing U-boot's console via serial connection and when u-boot prompt me to enter commands, it seems that I have limited time to do that. I want to enter several commands, but I need more time.
Does anyone experienced such an issue and how can I increase that time (if that is the problem)?
U-Boot's Boot Retry Mechanism, AKA, Preventing Eternally Hung Boot
Having the U-Boot command prompt timeout can actually be desirable behavior, as without this an inadvertent interruption of the boot could leave a system permanently stuck at the U-Boot prompt until the next power cycle.
Given this, in addition to the hardware watchdog possibility mentioned by Tom Rini, it is also possible that your U-Boot build could be set up with the "Boot Retry" feature - and not unlikely that others finding this page will (as I was) be seeking a way to intentionally cause such behavior.
If you see the following, you likely have boot retry:
Timeout waiting for command
resetting ...
Three build-time configuration options and one run-time variable govern boot retry:
CONFIG_BOOT_RETRY_TIME is the default number of seconds without a valid command, after which the (still interruptible) auto boot sequence will be automatically re-run.
bootretry is an environment variable containing the current delay in effect. Negative values mean boot retry will not occur. Unfortunately, this value is only sampled on startup - changing it will not prevent boot retry in the current session.
CONFIG_BOOT_RETRY_MIN is a safety limit on the above environment variable, however it appears that negative or disabling values get a pass through the check. This makes it harder to deduce the intended usage of this setting; if not explicitly set in the config it is assigned the value of CONFIG_BOOT_RETRY_TIME.
CONFIG_RESET_TO_RETRY is an option which means that instead of directly resuming the autoboot sequence, the processor will reboot. This may in fact be the only supported way of using boot retry; it seems that a build error asking you to set it results if you do not.
Critical note: Except in a few patched forks, these are not KConfig options which you can put in your board_defconfig, but rather #define's which must go in a C header file of the code itself, specifically one applicable to the system configuration which you build.
Disable Boot Retry
If you saw the above timeout message and suspect that boot retry is at fault, there are a few possible ways to stop it.
First, if your u-boot supports saving environment variables persistently, you could
u-boot> setenv bootretry -1
u-boot> saveenv
and then reboot. A few systems may still have an ancient bug which prevents parsing a negative value, in which case you could use a large positive one, such as 3600 seconds (one hour).
But unfortunately, you cannot do this without saving the environment variable, as it is only read on startup. To enable using the environment variable as a temporary override for maintenance, you could do something like this to re-evaluate it each time the timeout is reset by a valid command:
--- a/common/bootretry.c
+++ b/common/bootretry.c
## -39,6 +39,7 ## void bootretry_init_cmd_timeout(void)
*/
void bootretry_reset_cmd_timeout(void)
{
+ bootretry_init_cmd_timeout(); //pickup any environment change
endtime = endtick(retry_time);
}
This seems to work, in that you can set the bootretry to -1 for extended manual maintenance. It also seems you can set the bootretry to longer than default, but for reasons not understood, trying to set it shorter does not seem to work.
There does appear to be at least part of designed in mechanism where using configuring CONFIG_AUTOBOOT_STOP_STR and then entering it is supposed to stop the boot retry mechanism, but I couldn't get that to work or find any useful hits when searching on it.
To remove boot retry feature entirely
To remove the boot retry feature entirely, find where it is being defined in code applicable to your board (grep -r CONFIG_BOOT_RETRY * or similar), remove that, rebuild and reflash.
To achieve boot retry as a desired feature
First, put the necessary #define in a header applicable to your specific board, for example, if you had an Allwinner SoC you might do:
--- a/include/configs/sunxi-common.h
+++ b/include/configs/sunxi-common.h
## -16,6 +16,8 ##
#include <asm/arch/cpu.h>
#include <linux/stringify.h>
+#define CONFIG_BOOT_RETRY_TIME 60 //command prompt will timeout
+#define CONFIG_RESET_TO_RETRY //required for above on this chip
+
#ifdef CONFIG_OLD_SUNXI_KERNEL_COMPAT
/*
* The U-Boot workarounds bugs in the outdated buggy sunxi-3.4 kernels at the
Then rebuild u-boot, probably something like this:
make CROSS_COMPILE=~/path/to/gcc-xxx-yyy-zzz-/bin/xxx-yyy-zzz- clean
make CROSS_COMPILE=~/path/to/gcc-xxx-yyy-zzz-/bin/xxx-yyy-zzz- your_board_defconfig
make CROSS_COMPILE=~/path/to/gcc-xxx-yyy-zzz-/bin/xxx-yyy-zzz-
Repackage the result appropriately and flash it to your board
Warning: Always make sure you have a backup means of booting or flashing before over-writing the existing U-Boot!
Depending on your board, that might be something like the ability for the hardware itself to boot from an SD card or USB stick, to push code via a USB utility, or the or the ability to start the board via JTAG or similar. In a pinch some SoC's will release the lines to an SPI flash if you hold them in reset, allowing you to use an external programmer - but others will not release the lines, meaning you have to desolder the flash chip. Loading a bad U-Boot into a board where you have no other way of injecting code but through U-Boot itself can result in a brick!.
Without more details (such as platform, config and version), it's hard to say. Under normal circumstances the only timeout you have is to stop the automatic boot. If the board is resetting reliably after N seconds of being on it is likely that a watchdog is being triggered and U-Boot is not configured to know about and either disable or periodically pet the watchdog to keep the system from resettting.
I don't understand why these CONFIG options are part of Kconfig so that one can configure with "make menuconfig" and also save the settings in _defconfig files.
That makes the most sense rather than having to add and customise header files.
It's 2021 now. I wonder if it's worth submitting a patch?
Related
I was wondering how to create a timer in lua that works also in servers with no players on.
Timer.Simple or Timer.Create don't work, they need CurTime().
How could I do it?
Well one option is always to set the convar sv_hibernate_think to 1.
That is also the option provided on the official Wiki as shown here.
Depends on what's available. You likely can't import any extra libs, and Lua's capabilities have 'prolly been nerfed.
If standard clock capabilities still exist, you can do something with
local init, pause = os.clock(), 3
while os.clock() -init < pause do end
I don't know your exact use; could be made into a function if need be. That will eat up clock cycles. If coroutines exist, you might be able to have another script runing in the background while occasionally checking on timer.
What is the real difference between the LINUX_REBOOT_CMD_HALT and LINUX_REBOOT_CMD_POWER_OFF arguments to the reboot() system call (resp. the RB_HALT_SYSTEM and RB_POWER_OFF arguments given to its wrapper function)?
The reboot(2) manual page has the following descriptions (differences emphasized):
RB_HALT_SYSTEM
LINUX_REBOOT_CMD_HALT
(RB_HALT_SYSTEM, 0xcdef0123; since Linux 1.1.76). The message "System halted." is printed, and the system is halted. Control is given to the ROM monitor, if there is one. If not preceded by a sync(2), data will be lost.
LINUX_REBOOT_CMD_POWER_OFF
(RB_POWER_OFF, 0x4321fedc; since Linux 2.1.30).
The message "Power down." is printed, the system is stopped, and all power is removed from the system, if possible. If not preceded by a sync(2), data will be lost.
Reading the descriptions, a few questions come up:
What is the difference between halted and stopped?
Would a reboot(RB_HALT_SYSTEM) call not remove power from the
system?
Where would the "System halted." and "Power down." messages be printed?
I don't think there is a difference; those words are synonyms in common English and I think this documentation is just using their English meaning, not as specific technical terms.
correct, that's exactly what the docs are trying to tell you.
on the console and/or kernel log, duh. Where kernel messages are normally printed, like during bootup.
You can easily try these for yourself to see what they do; the user-space shutdown(8) command has -H (halt) and -P / -h (poweroff) options, as well as -r. Read the man page. I assume it eventually makes a reboot(2) system call, or causes init to make one, after a sync.
And yes, the traditional shutdown -h command is halt + power off, i.e. POWER_OFF. Back in the old days, computers didn't used to be able to power themselves off, but these days that's usually what people think of as a non-reboot shutdown. Especially on systems where the kernel can't "return" to a BIOS / firmware command interface.
On a PC, one of the few use-cases I could imagine for halt without poweroff would be to insert a USB drive or CD before pressing the reset button (or ctrl+alt+delete). But maybe you don't want the currently-booted Linux kernel to react to the new hardware at all, so you want to halt Linux first.
You could poweroff to do this, but you don't need to and there's no need to start/stop your rotating disks and put extra wear on their motors.
I'm working on one of Freesacle micro controller. This microcontroller has several reset sources (e.g. clock monitor reset, watchdog reset and ...).
Suppose that because of watchdog, my micro controller is reset. How can I save some data just before reset happens. I mean for example how can I understand that where had been the program counter just before watchdog reset. With this method I want to know where I have error (in another words long process) that causes watchdog reset.
Most Freescale MCUs work like this:
RAM is preserved after watchdog reset. But probably not after LVD reset and certainly not after power-on reset. This is in most cases completely undocumented.
The MCU will either have a status register where you can check the reset cause (for example HCS08, MPC5x, Kinetis), or it will have special reset vectors for different reset causes (for example HC11, HCS12, Coldfire).
There is no way to save anything upon reset. Reset happens and only afterwards can you find out what caused the reset.
It is however possible to reserve a chunk of RAM as a special segment. Upon power-on reset, you can initialize this segment by setting everything to zero. If you get a watchdog reset, you can assume that this RAM segment is still valid and intact. So you don't initialize it, but leave it as it is. This method enables you to save variable values across reset. Probably - this is not well documented for most MCU families. I have used this trick at least on HCS08, HCS12 and MPC56.
As for the program counter, you are out of luck. It is reset with no means to recover it. Meaning that the only way to find out where a watchdog reset occurred is the tedious old school way of moving a breakpoint bit by bit down your code, run the program and check if it reached the breakpoint.
Though in case of modern MCUs like MPC56 or Cortex M, you simply check the trace buffer and see what code that caused the reset. Not only do you get the PC, you get to see the C source code. But you might need a professional, Eclipse-free tool chain to do this.
Depending on your microcontroller you may get Reset Reason, but getting previous program counter (PC/IP) after reset is not possible.
Most of modern microcontrollers have provision for Watchdog Interrupt Instead of reset.
You can configure watchdog peripheral to enable interrupt , In that ISR you can check stored context on stack. ( You can take help from JTAG debugger to check call stack).
There are multiple debugging methods available if your micro-controller dosent support above method.
e.g
In simple while(1) based architecture you can use a HW timer and restart it after some section of code. In Timer ISR you will know which code section is consuming long enough than the timer.
Two things:
Write a log! And rotate that log to keep the last 30 min. or whatever reasonable amount of time you think you need to reproduce the error. Where the log stops, you can see what happened just before that. Even in production-level devices there is some level of logging.
(Less, practical) You can attach a debugger to nearly every micrcontroller and step through the code. Probably put a break-point that is hit just before you enter the critical section of code. Some IDEs/uCs allow having "data-breakpoints" that get triggered when certain variables contain certain values.
Disclaimer: I am not familiar with the exact microcontroller that you are using.
It is written in your manual.
I don't know that specific processor but in most microprocessors a watchdog reset is a soft reset, meaning that certain registers will keep information about the reset source and sometimes reason.
You need to post more specific information on your Freescale μC for this be answered properly.
Even if you could get the Program Counter before reset, it wouldn't be advisable to blindly set the program counter to another after reset --- as there would likely have been stack and heap information as well as the data itself may also have changed.
It depends on what you want to preserve after reset, certain behaviour or data? Volatile memory may or may not have been cleared after watchdog (see your uC datasheet) and you will be able to detect a reset after checking reset registers (again see your uC datasheet). By detecting a reset and checking volatile memory you may be able to prepare your uC to restart in a way that you'd prefer after the unlikely event of a reset occurring. You could create a global value and set it to a particular value in global scope, then if it resets, check the value against it when a reset event occurs -- if it is the same, you could assume other memory may also be the same. If volatile memory is not an option you'll need to have a look at the datasheet for non-volatile options, however it is also advisable not to continually write to non-volatile memory due to writing limitations.
The only reliable solution is to use a debugger with trace capability if your chip supports embedded instruction trace.
Some devices have an option to redirect the watchdog timeout to an interrupt rather then a reset. This would allow you to write the watchdog timeout handler much like an exception handler and dump or store the stack information including the return address which will indicate the location the interrupt occurred.
However in some cases, neither solution is a reliable method of achieving your aim. In a multi-tasking environment or system with interrupt handlers, the code running when the watchdog timeout occurs may not be the process that is causing the problem.
I am learning embedded systems on the ARM9 processor (SAM9G20). I am more familiar with procedural programming for general purpose. Thus what I am doing is going through the data sheet and learning what registers there are and how to manipulate them.
My question is, how do I know when the computer reset? I know that there is a Reset Controller that manages resets. A register called the Status Register (RSTC_SR) stores the source of the reset. Do I need to keep periodically reading this register?
My solution is to store the number of resets in the FRAM (or start by setting it to 0), once a reset happens, I compare this variable with the register value in my main function. If the register value is higher then obviously it reset. However I am sure there is a more optimized way (perhaps using interrupts). Or is this how its usually done?
You do not need to periodically check, since every time the machine is reset your program will re-start from the beginning.
Simply add checks to the startup code, i.e. early in main(), as needed. If you want to figure out things like how often you reset, then that is more difficult since typically (no experience with SAMs, I'm an STM32 type of guy) on-board timers etc will also reset. Best would be some kind of real-world independent clock, like an RTC that you can poll and save the value of. Please consider if you really need this, though.
A simple solution is to exploit the structure of your code.
Many code bases for embedded take this form:
int main(void)
{
// setup stuff here
while (1)
{
// handle stuff here
}
return 0;
}
You can exploit that the code above while(1) is only run once at startup. You could increment a counter there, and save it in non-volatile storage. That would tell you how many times the microcontroller has reset.
Another example is on Arduino, where the code is structured such that a function called setup() is called once, and a function called loop() is called continuously. With this structure, you could increment the variable in the setup()-function to achieve the same effect.
Whenever your processor starts up, it has by definition come out of reset. What the reset status register does is indicate the source or reason for the reset, such as power-on, watchdog-timer, brown-out, software-instruction, reset-pin etc.
It is not a matter of knowing when your processor has reset - that is implicit by the fact that your code has restarted. It is rather a matter of knowing the cause of the reset.
You need not monitor or read the reset status at all if your application has no need of it, but in some applications perhaps it is a useful diagnostic for example to maintain a count of various reset causes as it may be indicative of the stability of your system software, its power-supply or the behaviour of the operators. Ideally you'd want to log the cause with a timestamp assuming you have an suitable RTC source early enough in your start-up. The timing of resets is often a useful diagnostic where simply counting them may not be.
Any counting of the reset cause should occur early in your code start-up before any interrupts are enabled (because an interrupt may itself cause a reset). This may require you to implement the counters in the start-up code before main() is invoked in cases where the start-up code might enable interrupts - for stdio or filesystem support fro example.
A way to do this is to run the code in debug mode (if you got a debugger for the SAM). After a reset the program counter(PC) points to the address where your code starts.
I am writing a system monitor for Linux and want to include some watchdog functionality. In the kernel, you can configure the watchdog to keep going even if /dev/watchdog is closed. In other words, if my daemon exits normally and closes /dev/watchdog, the system would still re-boot 59 seconds later. That may or may not be desirable behavior for the user.
I need to make my daemon aware of this setting because it will influence how I handle SIGINT. If the setting is on, my daemon would need to (preferably) start an orderly shutdown on exit or (at least) warn the user that the system is going to reboot shortly.
Does anyone know of a method to obtain this setting from user space? I don't see anything in sysconf() to get the value. Likewise, I need to be able to tell if the software watchdog is enabled to begin with.
Edit:
Linux provides a very simple watchdog interface. A process can open /dev/watchdog , once the device is opened, the kernel will begin a 60 second count down to reboot unless some data is written to that file, in which case the clock re-sets.
Depending on how the kernel is configured, closing that file may or may not stop the countdown. From the documentation:
The watchdog can be stopped without
causing a reboot if the device
/dev/watchdog is closed correctly,
unless your kernel is compiled with
the CONFIG_WATCHDOG_NOWAYOUT option
enabled.
I need to be able to tell if CONFIG_WATCHDOG_NOWAYOUT was set from within a user space daemon, so that I can handle the shutdown of said daemon differently. In other words, if that setting is high, a simple:
# /etc/init.d/mydaemon stop
... would reboot the system in 59 seconds, because nothing is writing to /dev/watchdog any longer. So, if its set high, my handler for SIGINT needs to do additional things (i.e. warn the user at the least).
I can not find a way of obtaining this setting from user space :( Any help is appreciated.
AHA! After digging through the kernel's linux/watchdog.h and drivers/watchdog/softdog.c, I was able to determine the capabilities of the softdog ioctl() interface. Looking at the capabilities that it announces in struct watchdog_info:
static struct watchdog_info ident = {
.options = WDIOF_SETTIMEOUT |
WDIOF_KEEPALIVEPING |
WDIOF_MAGICCLOSE,
.firmware_version = 0,
.identity = "Software Watchdog",
};
It does support a magic close that (seems to) override CONFIG_WATCHDOG_NOWAYOUT. So, when terminating normally, I have to write a single char 'V' to /dev/watchdog then close it, and the timer will stop counting.
A simple ioctl() on a file descriptor to /dev/watchdog asking WDIOC_GETSUPPORT allows one to determine if this flag is set. Pseudo code:
int fd;
struct watchdog_info info;
fd = open("/dev/watchdog", O_WRONLY);
if (fd == -1) {
perror("open");
// abort, timer did not start - no additional concerns
}
if (ioctl(fd, WDIOC_GETSUPPORT, &info)) {
perror("ioctl");
// abort, but you probably started the timer! See below.
}
if (WDIOF_MAGICCLOSE & info.options) {
printf("Watchdog supports magic close char\n");
// You have started the timer here! Handle that appropriately.
}
When working with hardware watchdogs, you might want to open with O_NONBLOCK so ioctl() not open() blocks (hence detecting a busy card).
If WDIOF_MAGICCLOSE is not supported, one should just assume that the soft watchdog is configured with NOWAYOUT. Remember, just opening the device successfully starts the countdown. If all you're doing is probing to see if it supports magic close and it does, then magic close it. Otherwise, be sure to deal with the fact that you now have a running watchdog.
Unfortunately, there's no real way to know for sure without actually starting it, at least not that I could find.
a watchdog guards against hard-locking the system, either because of a software crash, or hardware failure.
what you need is a daemon monitoring daemon (dmd). check 'monit'
I think the watchdog device drivers are really intended for use on embedded platforms (or at least well controlled ones) where the developers will have control of which kernel is in use.
This could be considered to be an oversight, but I think it is not.
One other thing you could try, if the watchdog was built as a loadable module, unloading it will presumably abort the shutdown?