Handling Y2.036K & Y2.038K bugs - ntp

I am currently working on a project with a requirement that our software must operate until at least 2050. Recently we have run into problems dealing with the Y2.036K "bug" in the NTP protocol and also the Y2.038K bug. Basically, our software must continue to run past these dates with all data recorded using correct time stamps. Given that there is currently no solution to either of these bugs a workaround must be employed.
It is critical that our software keeps running after these two events and records dates correctly. It is not critical that the OS system time be correct. Given that we are using Java we should be able to handle dates relative to the prime epoch of 1900 after it rolls over. However, the java JVM will not even run if the system time is set before Unix epoch in 1970! It just crashes.
To add fuel to the fire, the NTP server is supplied by another vendor and we have no control over it. So using another protocol or modifying the server to handle any of this is not possible.
A creative solution is required. Needless to say, some deep voodoo must take place. We have considered the following:
Modify the ntpd client software to somehow co-operate with the ntp server and offset the local time from a date greater than Unix epoch in 1970 rather than 1900. Thus allowing the JVM to run without crashing on initialization. All time stamps will then be handled relative to our chosen rollover date. (So basically, make sure we rollover to a date greater than the Unix epoch).
Allow the ntp corrected time to rollover to 1900 epoch and find a fix so that the JVM will not crash.
Has anyone else tackled this issue? And also, are there any other issues that may occur which I have not foreseen, making one or both of these solutions not feasible at all?

Install your software on a 64 bit Linux with a 64 bit JVM. time_t and friends are 64 bit here, adjust the time past 2038 see if stuff still works. If you're good, toss away NTP, find a gps or other source which can be used as a precise clock and guarantees they don't have 32 bit problems, interface your software to read/sync time from that.

Related

Can bad C code cause a Blue Screen of Death?

I am a new coder in c, recently moved over from python, but still like to challenge myself with fairly ambitious projects (like a chess program), and have found that my computer suffers an unusual number of BSODs, both when I am running a program and not (admittedly, attempting to use the entirety of my memory as a hash table may not have been the greatest idea).
So my question is, are these most likely caused by my crappy c code, or is it more likely that my 3 year old, overworked laptop is the culprit?
If it could be the code, what are the big things I should avoid doing so as to prevent this?
BSOD usually contains some information as to what caused it.
What information it contains, and how exactly it is displayed depends on the version of Windows you are running.
As can be seen from the list here:
https://hetmanrecovery.com/recovery_news/bsod-errors
Most BSOD errors come from device / driver / kernel code, and not from your typical userland program.
That said, it might be possible to trigger BSOD if your code uses particularly low level windows API, especially if you run it with administrator privileges.
Note, that simply filling up memory will result in allocations for your program failing, and possibly your program, but not the whole OS crashing.
Also, windows does place limits on how much an individual process can allocate.
One final note:
"3 year old laptop" does not provide enough information to tell anything about your hardware, since there are different tiers of laptops available, and some of the high end 3 year old ones will still be better performing then a mid tier one bought yesterday.
As a troubleshooting measure, I would recommend backing up your data, making a clean install of your OS (aka "format the machine"), then making sure all your drivers are up to date.
You may also want to try hardware diagnostic tools, such as memtes86, check SMART on your storage, etc.
It's not supposed to be possible for anything you do in an ordinary "user space" program to crash the whole computer. Something else must also be wrong. Here are some possibilities:
If you are making the computer do CPU- and RAM-intensive work for long periods, you may stress the hardware to the point where a marginally defective component fails. Usually it's either the RAM, the power supply, or the cooling fans at fault.
Make sure your power supply is rated for all of the kit you have, running simultaneously. Make sure you have enough airflow for the amount of heat you're generating; check for dust-clogged heatsinks and fans that aren't actually spinning. If you have more than one RAM stick, take one out at a time and see if that makes the problem disappear.
I'd like to tell you to get error-correcting RAM if you don't have it already, but for infuriating market differentiation reasons you'd have to replace the motherboard and CPU as well. It's still worth doing, in the long run, but it amounts to replacing the whole computer.
You may be tickling a bug in the OS or the drivers. The most probable culprit is the GPU driver, particularly if your program does anything graphical. Regrettably, all you can do about this is make sure you're fully patched up.

Posix timezone string to Olson string

We have an embedded Linux system where the user needs to be able to permanently set the system's timezone by supplying a POSIX string (e.g. WEuropeStandardTime-1DaylightTime,M3.5.0,M10.5.0/3).
The user interacts with the system over a webservice we develop so we have complete control of the implementation. I'm looking for a C/C++ solution.
I've been looking at timedatectl -set-timezone, but it only accepts Olson timezone descriptions, not POSIX timezone strings. I was thinking I could parse the tzdata to find a match for the POSIZ timezone string, but before starting down that path I'd like to know if there is a better way or if there is already a library to do this conversion.
I've discounted setting the TZ environment variable as it is an override for the system date and time set by timedatectl and it feels like a bodge. Also, I'm not sure how I'd set it early enough that all software running from boot would see the same time.
What you ask is not possible. There is much more information in the data represented behind an IANA (aka "Olson") time zone identifier than can fit in a POSIX time zone string. In particular a POSIX time zone string can only contain one single pair of transition rules. It provides no facilities for tracking how those rules have changed over time or how they might be scheduled to change in the future. It also doesn't provide for cases where the standard time offset has changed, or where there are more than one pair of transitions in a single year. You can read more about this in the section "POSIX style time zones" towards the bottom of the timezone tag wiki.
However, since you said the user interacts with the system over a web service, you might instead consider projecting a POSIX string. The user would pick an IANA time zone for the device (or you could look it up by location), and you'd store that. Then when communicating with the device, you'd generate a POSIX string to deliver for use on the device over a given period. You'd want to periodically update it, perhaps every time the device checks in or is updated (depending on your scenario).
I know of one commercial offering with this capability built-in: the Azure Maps Timezone API. In its responses, you'll see a PosixTz string. (This was added specifically for IoT scenarios.)
Alternatively, you could do this yourself in your own server-side code. The logic is a bit tricky, but it is indeed possible.
If your back end is .NET - you're in luck. My TimeZoneConverter.Posix library can do this for you. For example:
string posix = PosixTimeZone.FromIanaTimeZoneName("Australia/Sydney");
// Result: "AEST-10AEDT,M10.1.0,M4.1.0/3"
If your back end is JavaScript, there's some unsupported code in Moment-Timezone issue #314 that you can leverage or adapt.
If your back end is something else, I'm not aware of readily available solutions. However, you may be able to port the code in either of the above to your platform.

How to measure execution time in FreeRTOS?

I have a version of FreeRTOS which comes with a TraceAlyzer tool and I need to compare how it affects the effectivity of the whole systems (for what time it slows it down). I have 2 simple tasks which run and delay for a short time. I run system twice with Tracealyzer started and without for some number of iterations.
I am aware of vTaskGetRunTimeStats(), but as far as I understand it only measures the run time of one task, not of the entire system. At the moment I am using the PowerShell tool Measure-Command, but I would like to use a built-in tool in FreeRTOS.
How do I measure the execution time for the entire system (all tasks, not just one) in FreeRTOS?
vTaskGetRunTimeStats() will provide stats for all tasks, provided you have configured the clock it uses. This image is an example of the data it provides:
https://freertos.org/rtos-run-time-stats.jpg
If you just want the raw data then use https://freertos.org/uxTaskGetSystemState.html

GWAN Floating point exception

When trying to run GWAN on Ubuntu 12.04 LTS I sometimes get the "Floating point exception" error. Sometimes it will happen many times in a row, and it can start and run fine a few times in a row. But it always happens now and then, seems to be random..
Nagi's questions above are relevant: it would help to know under which conditions you are using G-WAN.
The link Nagi provided is also relevant but if you were facing a script error then the crash would be constant.
There's another known case which may lead to an erratic floating point error at startup, and that's hypervisors.
You are most probably experiencing the second case, for which workarounds have been implemented in May last year (G-WAN v4.5+).
Taking place between the hardware and the OS, many hypervisors are breaking the CPU description reported by the Linux kernel, and they also happen to break CPU features(!) and system features like timers, memory allocation, etc.
In short, you are likely to use G-WAN v4.3.28 and need to upgrade to a more recent release. We give more recent releases to registered users and people who contribute to the project with code, ideas, etc.
The next public release will be make available after our G-WAN-based Cloud services are shipped (later this year).

How do you profile your code?

I hope not everyone is using Rational Purify.
So what do you do when you want to measure:
time taken by a function
peak memory usage
code coverage
At the moment, we do it manually [using log statements with timestamps and another script to parse the log and output to excel. phew...)
What would you recommend? Pointing to tools or any techniques would be appreciated!
EDIT: Sorry, I didn't specify the environment first, Its plain C on a proprietary mobile platform
I've done this a lot. If you have an IDE, or an ICE, there is a technique that takes some manual effort, but works without fail.
Warning: modern programmers hate this, and I'm going to get downvoted. They love their tools. But it really works, and you don't always have the nice tools.
I assume in your case the code is something like DSP or video that runs on a timer and has to be fast. Suppose what you run on each timer tick is subroutine A. Write some test code to run subroutine A in a simple loop, say 1000 times, or long enough to make you wait at least several seconds.
While it's running, randomly halt it with a pause key and sample the call stack (not just the program counter) and record it. (That's the manual part.) Do this some number of times, like 10. Once is not enough.
Now look for commonalities between the stack samples. Look for any instruction or call instruction that appears on at least 2 samples. There will be many of these, but some of them will be in code that you could optimize.
Do so, and you will get a nice speedup, guaranteed. The 1000 iterations will take less time.
The reason you don't need a lot of samples is you're not looking for small things. Like if you see a particular call instruction on 5 out of 10 samples, it is responsible for roughly 50% of the total execution time. More samples would tell you more precisely what the percentage is, if you really want to know. If you're like me, all you want to know is where it is, so you can fix it, and move on to the next one.
Do this until you can't find anything more to optimize, and you will be at or near your top speed.
You probably want different tools for performance profiling and code coverage.
For profiling I prefer Shark on MacOSX. It is free from Apple and very good. If your app is vanilla C you should be able to use it, if you can get hold of a Mac.
For profiling on Windows you can use LTProf. Cheap, but not great:
http://successfulsoftware.net/2007/12/18/optimising-your-application/
(I think Microsoft are really shooting themself in the foot by not providing a decent profiler with the cheaper versions of Visual Studio.)
For coverage I prefer Coverage Validator on Windows:
http://successfulsoftware.net/2008/03/10/coverage-validator/
It updates the coverage in real time.
For complex applications I am a great fan of Intel's Vtune. It is a slightly different mindset to a traditional profiler that instruments the code. It works by sampling the processor to see where instruction pointer is 1,000 times a second. It has the huge advantage of not requiring any changes to your binaries, which as often as not would change the timing of what you are trying to measure.
Unfortunately it is no good for .net or java since there isn't a way for the Vtune to map instruction pointer to symbol like there is with traditional code.
It also allows you to measure all sorts of other processor/hardware centric metrics, like clocks per instruction, cache hits/misses, TLB hits/misses, etc which let you identify why certain sections of code may be taking longer to run than you would expect just by inspecting the code.
If you're doing an 'on the metal' embedded 'C' system (I'm not quite sure what 'mobile' implied in your posting), then you usually have some kind of timer ISR, in which it's fairly easy to sample the code address at which the interrupt occurred (by digging back in the stack or looking at link registers or whatever). Then it's trivial to build a histogram of addresses at some combination of granularity/range-of-interest.
It's usually then not too hard to concoct some combination of code/script/Excel sheets which merges your histogram counts with addresses from your linker symbol/list file to give you profile information.
If you're very RAM limited, it can be a bit of a pain to collect enough data for this to be both simple and useful, but you would need to tell us a more about your platform.
nProf - Free, does that for .NET.
Gets the job done, at least enough to see the 80/20. (20% of the code, taking 80% of the time)
Windows (.NET and Native Exes): AQTime is a great tool for the money. Standalone or as a Visual Studio plugin.
Java: I'm a fan of JProfiler. Again, can run standalone or as an Eclipse (or various other IDEs) plugin.
I believe both have trial versions.
The Google Perftools are extremely useful in this regard.
I use devpartner with MSVC 6 and XP
How are any tools going to work if your platform is a proprietary OS? I think you're doing the best you can right now

Resources