DateTime as a key - c

I'm trying to use datetime type as a key in b-tree BerkeleyDB database. My goals:
minimum overhead for datetime storage
key comparison by date (to retrieve range)
reasonable speed
How to represent datetime in most compact form and use default bsddb's key comparison algorithm?
Is it hard to do this in C and create small Python extension for such tasks? I'm not experienced in C and only able to understand small C snippets (and copy-paste them).

What range of datetime values are you interested in? And what resolution on the time?
As fge indicated in a comment, if you want 1 second resolution over a period limited to 1902-2037, then you can use a 32-bit signed integer and the number of seconds since the Unix Epoch, which is 1970-01-01 00:00:00 +00:00 (midnight on 1st January 1970 in UTC). If you want a wider range, then you should probably use a 64-bit signed integer relative to the Unix Epoch. If you want sub-second accuracy, store a 32-bit signed integer which is the number of nanoseconds. Note that for a negative time (before 1970), the fractional seconds should be negative too.
One reason for suggesting these representations is that the value can easily be found via standard Unix (POSIX) interfaces, such as time() for 1-second resolution and clock_gettime() for nanosecond resolution or gettimeofday() for microsecond resolution.

Related

Is it possible to force mktime() to return a timestamp in UTC always?

Because the compiler the code will run on doesn't accept _mkgmtime and only mktime, I am forced to use mktime to convert broken down time to Unix TimeStamp and viceversa.
The old solution was to use _mkgmtime and gmtime to convert from broken down time to UNIX timestamp and viceversa. This worked until I tried to compile it and use it on my microcontroller.
Now, I have to somehow use mktime to generate UNIX timestamp from broken-down time and then to convert from broken-down time to UNIX timestamp. Both in UTC
Is it possible to force mktime() to return a timestamp in UTC always?
The C language specification says that the return value of mktime() is encoded the same way as that of time(), but it explicitly leaves that encoding unspecified. Thus, the answer depends on the C implementation where your code will run.
On a POSIX system such as Linux, time() returns an integer number of seconds since the epoch, which is defined in terms of UTC, not local time. Therefore, if your target machine is such a system then you don't need to do anything to get mktime to return a UTC timestamp.
HOWEVER, mktime assumes that its input is expressed in broken-down local time, and it will use the configured time zone (which is not included in the broken-down time) to perform the calculation. How the local time zone is configured is system dependent.

Will `gmtime()` report seconds as 60 when in a leap second?

I have a server running in TZ=UTC and I have code like this:
time_t t = time(NULL);
struct tm tm;
gmtime_r(&t, &tm);
The question is will tm.tm_sec == 60 when the server is within a leap second?
For example, if I were in the following time span:
1998-12-31T23:59:60.00 - 915 148 800.00
1998-12-31T23:59:60.25 - 915 148 800.25
1998-12-31T23:59:60.50 - 915 148 800.50
1998-12-31T23:59:60.75 - 915 148 800.75
1999-01-01T00:00:00.00 - 915 148 800.00
would gmtime() return tm == 1998-12-31T23:59:60 for time_t = 915148800 and, once out of the leap second, return tm == 1999-01-01T00:00:00 for the same time_t?
The short answer is, no, practically speaking gmtime_r will never fill in tm_sec with 60. This is unfortunate, but unavoidable.
The fundamental problem is that time_t is, per the Posix standard, a count of seconds since 1970-01-01 UTC assuming no leap seconds.
During the most recent leap second, the progression was like this:
1483228799 2016-12-31 23:59:59
1483228800 2017-01-01 00:00:00
Yes, there should have been a leap second, 23:59:60, in there. But there's no possible time_t value in between 1483228799 and 1483228800.
I know of two ways for a gmtime variant to return a time ending in :60:
You can run your OS clock on something other than UTC, typically TAI or TAI-10, and use the so-called "right" timezones to convert to UTC (or local time) for display. See this web page for some discussion on this.
You can use clock_gettime() and define a new clkid value, perhaps CLOCK_UTC, which gets around the time_t problem by using deliberately nonnormalized struct timespec values when necessary. For example, the way to get a time value in between 1483228799 and 1483228800 is to set tv_sec to 1483228799 and tv_nsec to 1000000000. See this web page for more details.
Way #1 works pretty well, but nobody uses it because nobody wants to run their kernel clock on anything other than the UTC it's supposed to be. (You end up having problems with things like filesystem timestamps, and programs like tar that embed those timestamps.)
Way #2 is a beautiful idea, IMO, but to my knowledge it has never been implemented in a released OS. (As it happens, I have a working implementation for Linux, but I haven't released my work yet.) For way #2 to work, you need a new gmtime variant, perhaps gmtime_ts_r, which accepts a struct timespec instead of a time_t.
Addendum: I just reread your question title. You asked, "Will gmtime() report 60 for seconds when the server is on a Leap Second?" We could answer that by saying "yes, but", with the disclaimer that since most servers can't represent time during a leap second properly, they're never "on" a leap second.
Addendum 2: I forgot to mention that scheme #1 seems to work better for local times -- that is, when you're calling one of the localtime variants -- than for UTC times and gmtime. Clearly the conversions performed by localtime are affected by the setting of the TZ environment variable, but it's not so clear that TZ has any effect on gmtime. I've observed that some gmtime implementations are influenced by TZ and can therefore do leap seconds in accordance with the "right" zones, and some cannot. In particular, the gmtime in GNU glibc seems to pay attention to the leap second information in a "right" zone if TZ specifies one, whereas the gmtime in the IANA tzcode distribution does not.
The question is will tm.tm_sec == 60 when the server is within a leap second?
No. On a typical UNIX system, time_t counts the number of non-leap seconds since the epoch (1970-01-01 00:00:00 GMT). As such, converting a time_t to a struct tm will always yield a time structure with a tm_sec value between 0 and 59.
Ignoring leap seconds in time_t reckoning makes it possible to convert a time_t to a human-readable date/time without full knowledge of all leap seconds before that time. It also makes it possible to unambiguously convert time_t values in the future; including leap seconds would make that impossible, as the presence of a leap second isn't known beyond 6 months in the future.
There are a few ways that UNIX and UNIX-like systems tend to handle leap seconds. Most typically, either:
One time_t value is repeated for the leap second. (This is the result of a strict interpretation of standards, but will cause many applications to malfunction, as it appears that time has gone backwards.)
System time is run slightly slower for some time surrounding the leap second to "smear" the leap second across a wider period. (This solution has been adopted by many large cloud platforms, including Google and Amazon. It avoids any local clock inconsistencies, at the expense of leaving the affected systems up to half a second out of sync with UTC for the duration.)
The system time is set to TAI. Since this doesn't include leap seconds, no leap second handling is necessary. (This is rare, as it will leave the system several seconds out of sync with UTC systems, which make up most of the world. But it may be a viable option for systems which have little to no contact with the outside world, and hence have no way of learning of upcoming leap seconds.)
The system is completely unaware of leap seconds, but its NTP client will correct the clock after the leap second leaves the system's clock one second off from the correct time. (This is what Windows does.)
POSIX specifies the relationship between time_t "Seconds Since the Epoch" values and broken-down (struct tm) time exactly in a way that does not admit leap seconds or TAI, so essentially (up to some ambiguity about what should happen near leap seconds), POSIX time_t values are UT1, not UTC, and the results of gmtime reflect that. There is really no way to adapt or change this that's compatible with existing specifications and existing software based on them.
The right way forward is almost certainly a mix of what Google has done with leap second smearing and a standardized formula for converting back and forth between "smeared UTC" and "actual UTC" times (and thus also TAI) in the 24-hour window around a leap second and APIs to perform these conversions.
There is absolutely no easy answer to this. For there to be a 60 second when there is a leap second, you require 1) something in the OS to know there is a leap second due, and 2) for the C library that your using to also know about the leap second, and do something with it.
An awful lot of OSes and libraries don't.
The best I've found is modern versions of Linux kernel teamed up with gpsd and ntpd, using a GPS receiver as the time reference. GPS advertises leap seconds in its system datastream, and gpsd, ntpd and the Linux kernel can maintain CLOCK_TAI whilst the leap second is happening, and the system clock is correct too. I don't know if glibc does a sensible thing with the leap second.
On other UNIXes your mileage will vary. Considerably.
Windows is a ******* disaster area. For example the DateTime class in C# doesn't know about historical leap seconds. The system clock will jump 1 second next time a network time update is received.
I read this at www.cplusplus.com about gmtime: "Uses the value pointed by timer to fill a tm structure with the values that represent the corresponding time, expressed as a UTC time (i.e., the time at the GMT timezone)".
So there's a contradiction. UTC has seconds of absolutely constant length and therefore needs leap seconds, while GMT has days of exactly 86,400 seconds of very slightly varying lengths. gmtime() cannot at the same time work in UTC and GMT.
When we are told that gmtime () returns "UTC assuming no leap seconds" I would assume this means GMT. Which would mean there are no leap seconds recorded, and it would mean that the time slowly diverges from UTC, until the difference is about 0.9 seconds and a leap second is added in UTC, but not in GMT. That's easy to handle for developers but not quite accurate.
One alternative is to have constant seconds, until you are close to a leap second, and then adjust maybe 1000 seconds around that leap second in length. It's also easy to handle, 100% accurate most of the time, and 0.1% error in the length of a second sometimes for 1000 second.
And the second alternative is to have constant seconds, have leap seconds, and then forget them. So gmtime() will return the same second twice in a row, going from x seconds 0 nanoseconds to x seconds 999999999 nanoseconds, then again from x seconds 0 nanoseconds to x seconds 999999999 nanoseconds, then to x+1 seconds. Which will cause trouble.
Of course having another clock that will return exact UTC including leap seconds, with exactly accurate seconds, would be useful. To translate "seconds since epoch" to year, month, day, hours, minutes, seconds requires knowledge of all leap seconds since epoch (or before epoch if you handle times before that). And a clock that will return guaranteed exact GMT with no leap seconds and seconds that are almost but not quite constant time.
Another angle to their problem is having a library that 'know so' about leap seconds. Most libraries don't and so the answers you get from functions like gmtime are, strictly speaking, inaccurate during a leap second. Also time difference calculations often produce inaccurate results straddling a leap second. For example the value for time_t given to you at the same UTC time yesterday is exactly 86400 seconds smaller than today's value, even if there was actually a leap second.
The astronomy community has solved this. Here is the SOFA Library that has proper time routines within. See their manual (PDF), the section on timescales. If made part of your software and kept up to date (a new version is needed for each new leap second) you have accurate time calculations, conversions and display.

How does the C function time() treat fractional seconds?

The time() function will return the seconds since 1970. I want to know how it does the rounding for the second returned.
For example, for 100.4s, will it return 100 or 101?
Is there a explicit definition?
The ISO C standard doesn't say much. It says only that time() returns
the implementation’s best approximation to the current calendar time
The result is of type time_t, which is a real type (integer or floating-point) "capable of representing times".
A lot of systems implement time_t as a signed integer type representing the whole number of seconds since 1970-01-01 00:00:00 GMT.
A quick experiment on my Ubuntu system (comparing the values returned by time() and gettimeofday()) indicates that the value returned by time() is truncated, so for example if the high-precision time is 1514866171.750058, the time() function will return 1514866171. Neither ISO C nor POSIX guarantees this, but I'd expect it to behave consistently on any UNIX-like systems.
7.27.2.4p3
The time function returns the implementation's best approximation to
the current calendar time. The value (time_t)(-1) is returned if the
calendar time is not available. If timer is not a null pointer, the
return value is also assigned to the object it points to.
It's implementation defined, so unless you specify your compiler and operating system, the answer is "best approximation".
Is there a explicit definition?
No
http://en.cppreference.com/w/c/chrono/time
The encoding of calendar time in time_t is unspecified, but most systems conform to POSIX specification and return a value of integral type holding the number of seconds since the Epoch. Implementations in which time_t is a 32-bit signed integer (many historical implementations) fail in the year 2038.

Converting elisp unixtime into js-date object

for a specific application I need to handle elisp internal unix time date format in Javascript. Elisp (current-time) comes with this special format:
current-time is a built-in function in `editfns.c'.
(current-time)
Return the current time, as the number of seconds since 1970-01-01 00:00:00.
The time is returned as a list of integers (HIGH LOW USEC PSEC).
HIGH has the most significant bits of the seconds, while LOW has the
least significant 16 bits. USEC and PSEC are the microsecond and
picosecond counts.
So i´m getting a time string: [21039,58064,0] (json representation of (21039 58064 0)). How can i convert this into a JS Date Object with javascript? Its easy in emacs, but this is not an option
Date(21039 * Math.pow(2, 16) + 58064);
Note that you don't need to write it exactly this way, Math.pow(2, 16) because this is a constant expression. This is so you could understand what is going on.
Also note that you can't use bitwise operations on floats (Numbers larger then 2^32 in JavaScript parlance). So you have to multiply instead of shifting and sum instead of "or"ing.

RRD with a high-precision?

Is it possible to use RRDs with a high-precision? And by high-precision I mean e.g. in the range of milli-seconds.
If not, are there equally good alternatives to RRD with a C API that work under Linux?
the step size in rrdtool is an integer and thus can not be less than a second. BUT updates can carry a ms precision time stamp and will be handled correctly. It is just that you can not store samples more often than once per second.

Resources