My process runs multiple instances (processes) and multiple threads and all of them write to the same database. As soon as the request is placed, a unique req id is generated for the record to be added to the proprietary db. Here are our limitations: It cannot be more than 9 char length, needs to have hhmmss as the first 6 chars. We decided to use ms for the last 3 digits to complete the 9 chars and we are doing all this using gettimeofday() . However, with increased traffic, there are now instances of collisions when multiple requests are placed with in a ms period. This combined with the fact that gettimeofday() itself is not accurate is causing an increased number of collissions. I tried to use clock_gettime but when tested, it is also not that accurate as I observed from the following test program:
We couldn't use static or global variables due to threading issues
Unable to use random numbers as they need to be sequential
Appreciate any help.
#include <time.h>
int main( int argc, char **argv )
{
long i;
struct timespec start, stop;
double gap;
clock_gettime( CLOCK_REALTIME, &start);
for (i =0; i< 123456789 ; i++);
clock_gettime( CLOCK_REALTIME, &stop);
gap = ( stop.tv_sec - start.tv_sec ) + ( stop.tv_nsec - start.tv_nsec ) / 1000000;
printf( "%lf ms\n", gap );
return 0;
}
The type of problem you are describing has already been more-or-less solved by issuing a UUID. This is a system that is designed to solve all the problems you mention and some more.
A linux library: http://linux.die.net/man/3/uuid
More information is available here: http://en.wikipedia.org/wiki/Universally_unique_identifier
Using a time stamp as a unique ID will never work reliably unless you limit yourself to only one transaction per lowest clock tick (1 millisecond in this case).
Since you are stuck using a time value for the first 6 of 9 bytes you need to try to fit as much range into the last 3 bytes as possible.
If you can get away with not using ASCII characters in the last 3 bytes then you should avoid it since that will limit the values it can have a great deal. If possible you should try to use these bytes as a 24 bit integer (range of 16777216) and just have each transaction increment the counter. You could then set it back to 0 each time that gettimeofday let you know that the time had changed. (or you could set up an repeating SIGALRM to let you know when to call gettimeofday again to update your time and 0 the 24 bit integer).
If you are forced to use ASCII printable characters for these bytes then things are a little bit more difficult. The easiest way to extend the range of this would be to use hexadecimal rather than decimal numbers. This grows your representable range from 1000 to 4096. You can do better if you use an even broader number base, though. If you tacked on the first 22 characters of the alphabet (the same way that tacking on the first 6 letters is done for hex) then you can represent 32x32x32 values, which is 32768. That would be a lot of transactions per second. You can do even better if you extend your numeric alphabet even further, but it will become more piecemeal as you do since you will probably want to restrict some characters from being appearing in the value. Using a representation that strtol or strtoul can easily work with will likely be easier to program.
If your application is multithreaded then you may want to consider taking up part of your numeric range as a thread ID and let each thread keep its own transaction counter. This will make determining the relative time between two transactions processed by different threads more difficult to calculate, but it will keep the threads from all wanting to increment the same memory location (which may require a mutex or semaphore).
Generally using clock time on a heavy loaded system like this with a resolution under a second is a bad idea anyhow. Threads will take their timestamp and then be descheduled in the middle of the operation, so you will see things arriving out of order.
Three characters left to encode things uniquely is not much. Try at least to use some different encoding such as base64.
If you use gcc as a compiler you have thread local storage (TLS) as an extension that is quite efficient. Just prefix your static variable with __thread (or so). If you are restricted to phtreads, there are means to get thread specific keys, too, pthread_get_key. But better would be to have the information as long as possible on the stack of the thread.
To obtain a per thread counter that makes a serial number for your request use
your hhmmss timestamp as so far
as much bits that you need to
identify your threads
the last bits for the per thread serial number
as above that should only
wrap round after more than a second
You could even be cheating and yield a thread that fires too many requests within the same second.
I guess you could give each thread of each process a unique ID at startup, I guess this would take only one of the 3 available characters unless you have hundreds of threads. You can then use a local counter per-thread to set the last two characters (using base64 or even more, depending on what characters are allowed, to get enough amplitude).
In this situation the only case where a collision may happen is if the counter of a thread wraps during the same second.
Of course, this is a dirty hack. The Right Way would be to share a ressource amongst the threads/processes. It might be the simplest solution in your case tho.
Related
I have the following code:
#include <stdio.h>
#include <time.h>
int main(){
clock_t timerS;
int i=1, targetTime=2;
scanf("%d", &targetTime);
while(i!=0){
timerS = clock();
while ((double)((clock() - timerS) / CLOCKS_PER_SEC) < targetTime){
//do something
}
//do another thing but delayed by the given time
if(targetTime>=0.5)
targetTime-=0.02;
else i=0;
}
return 0;
}
And what I want to do is having a loop which does something for (initially) an inputted amount of seconds and also doing another thing after targetTime-seconds have passed.
But after the first loop, to change the speed with which these operations are made(more specifically -0.02 seconds in this case).
An example would be getting multiple user inputs from user for 2 seconds, and displaying all the inputs made in these 2 seconds afterwards.
First problem is
If the initial given time is smaller than 1 second (for example 0.6), the other thing isn't delayed by 0.6 seconds, but is done immediately.
Second problem is
Actually similar to the first, if I subtract 0.02 seconds (in this case) from targetTime, it again does the other thing immediately and not in targetTime-0.02 seconds as I intend it to.
I'm new to this "clock" and "time" topic in C so I guess I'm doing something wrong regarding how these operations should be done. Also, please don't give an overly-complicated explanation/solution because of the above-mentioned reason.
Thanks!
Don't use the clock(2) system call, as it is obsolete and has been fully superseeded by machine independent replacements.
You can use, if your system supports it, clock_gettime(2), that will give you up to nanosecond precission (depending on the platform, but at least in linux on Intel architectures it is almost warranted) or, if you cannot use it, at least you'll have gettimeofday(2), which is derived from BSD systems, and provides you with a clock with microsecond resolution.
If you want to stop your program for some delay, you have also sleep(2) (second based) usleep(2) (microsecond based) or even nsleep(2) (nanosecond based)
Anyway, any of these calls has a tick that is not based on the system heartbeat, and the resolution is uniform and not system dependant.
I mistakenly initiated targetTime as int instead of double. Changing it to double solves the issue easily. Sorry!
Does anyboy have an idea how the time() function works?
I was looking online for implementations out of pure curiosity, but could only find the NetBSD implementation of difftime()
Also is there anything that describes the process of calculating the time (non system specific or system specific)?
Note: I am not looking for answers on how to use time() but how it actually works behind the scenes when I call it.
Somewhere deep down in your computer, typically in hardware, there's a clock oscillator running at some frequency f. For the purposes of this example let's say that it's operating at 1 kHz, or 1,000 cycles per second. Things are set up so that every cycle of the oscillator triggers a CPU interrupt.
There's also a low-level counter c. Every time the clock interrupt is triggered, the OS increments the counter. For the moment we'll imagine it increments it by 1, although this won't usually be the case in practice.
The OS also checks the value of the counter as it's incremented. When c equals 1,000, this means that exactly one second has gone by. At this point the OS does two things:
It increments another counter variable, the one that's keeping track of the actual time of day in seconds. We'll call this other counter t. (It's going to be a big number, so it'll be at least a 32-bit variable, or these days, 64 bits if possible.)
It resets c to 0.
Finally, when you call time(), the kernel simply returns you the current value of t. It's pretty simple, really!
Well, actually, it's somewhat more complicated than that. I've overlooked the details of how the value of the counter t gets set up initially, and how the OS makes sure that the oscillator is running at the right frequency, and a few other things.
When the OS boots, and if it's on a PC or workstation or mainframe or other "big" computer, it's typically got a battery-backed real-time clock it can use to set the initial value of t from. (If the CPU we're talking about is an embedded microcontroller, on the other hand, it may not have any kind of clock, and all of this is moot, and time() is not implemented at all.)
Also, when you (as root) call settimeofday, you're basically just supplying a value to jam into the kernel's t counter.
Also, of course, on a networked system, something like NTP is busy keeping the system's time up-to-date.
NTP can do that in two ways:
If it notices that t is way off, it can just set it to a new value, more or less as settimeofday() does.
If it notices that t is just a little bit off, or if it notices that the underlying oscillator isn't counting at quite the right frequency, it can try to adjust that frequency.
Adjusting the frequency sounds straightforward enough, but the details can get pretty complicated. You can imagine that the frequency f of the underlying oscillator is adjusted slightly. Or, you can imagine that f is left the same, but when the time interrupt fires, the numeric increment that's added to c is adjusted slightly.
In particular, it won't usually be the case that the kernel adds 1 to c on each timer interrupt, and that when c reaches 1,000, that's the indication that one second has gone by. It's more likely that the kernel will add a number like 1,000,000 to c on each timer interrupt, meaning that it will wait until c has reached 1,000,000,000 before deciding that one second has gone by. That way, the kernel can make more fine-grained adjustments to the clock rate: if things are running just a little slow, it can change its mind, and add 1,000,001 to c on each timer interrupt, and this will make things run just a tiny bit faster. (Something like one part per million, as you can pretty easily see.)
One more thing I overlooked is that time() isn't the only way of asking what the system time is. You can also make calls like gettimeofday(), which gives you a sub-second time stamp represented as seconds+microseconds (struct timeval), or clock_gettime(), which gives you a sub-second time stamp represented as seconds+nanoseconds (struct timespec). How are those implemented? Well, instead of just reading out the value of t, the kernel can also peek at c to see how far into the next second it is. In particular, if c is counting up to 1,000,000,000, then the kernel can give you microseconds by dividing c by 1,000, and it can give you nanoseconds by returning c directly.
Two footnotes:
(1) If we've adjusted the frequency, and we're adding 1,000,001 to c on each low-level timer tick, c will usually not hit 1,000,000,000 exactly, so the test when deciding whether to increment t will have to involve a greater-than-or-equal-to condition, and we'll have to subtract 1,000,000,000 from c, not just clear it. In other words, the code will look something like
if(c >= 1000000000) {
t++;
c -= 1000000000;
}
(2) Since time() and gettimeofday() are two of the simplest system calls around, and since programs calling them may (by definition) be particularly sensitive to any latency due to system call overhead, these are the calls that are most likely to be implemented based on the vDSO mechanism, if it's in use.
The C specification does not say anything about how library functions work. It only states the observable behavior. The internal workings is both compiler and platform dependent.
Synopsis
#include <time.h>
time_t time(time_t *timer);
Description
The time function determines the current calendar time. The encoding of the value is unspecified.
Returns
The time function returns the implementation's best approximation to the current calendar time. The value (time_t)(-1) is returned if the calendar time is not available. If timer is not a null pointer, the return value is also assigned to the object it points to.
https://port70.net/~nsz/c/c11/n1570.html
Here is one implementation:
time_t
time (timer)
time_t *timer;
{
__set_errno (ENOSYS);
if (timer != NULL)
*timer = (time_t) -1;
return (time_t) -1;
}
https://github.com/lattera/glibc/blob/master/time/time.c
Come someone please tell me how this function works? I'm using it in code and have an idea how it works, but I'm not 100% sure exactly. I understand the concept of an input variable N incrementing down, but how the heck does it work? Also, if I am using it repeatedly in my main() for different delays (different iputs for N), then do I have to "zero" the function if I used it somewhere else?
Reference: MILLISEC is a constant defined by Fcy/10000, or system clock/10000.
Thanks in advance.
// DelayNmSec() gives a 1mS to 65.5 Seconds delay
/* Note that FCY is used in the computation. Please make the necessary
Changes(PLLx4 or PLLx8 etc) to compute the right FCY as in the define
statement above. */
void DelayNmSec(unsigned int N)
{
unsigned int j;
while(N--)
for(j=0;j < MILLISEC;j++);
}
This is referred to as busy waiting, a concept that just burns some CPU cycles thus "waiting" by keeping the CPU "busy" doing empty loops. You don't need to reset the function, it will do the same if called repeatedly.
If you call it with N=3, it will repeat the while loop 3 times, every time counting with j from 0 to MILLISEC, which is supposedly a constant that depends on the CPU clock.
The original author of the code have timed and looked at the assembler generated to get the exact number of instructions executed per Millisecond, and have configured a constant MILLISEC to match that for the for loop as a busy-wait.
The input parameter N is then simply the number of milliseconds the caller want to wait and the number of times the for-loop is executed.
The code will break if
used on a different or faster micro controller (depending on how Fcy is maintained), or
the optimization level on the C compiler is changed, or
c-compiler version is changed (as it may generate different code)
so, if the guy who wrote it is clever, there may be a calibration program which defines and configures the MILLISEC constant.
This is what is known as a busy wait in which the time taken for a particular computation is used as a counter to cause a delay.
This approach does have problems in that on different processors with different speeds, the computation needs to be adjusted. Old games used this approach and I remember a simulation using this busy wait approach that targeted an old 8086 type of processor to cause an animation to move smoothly. When the game was used on a Pentium processor PC, instead of the rocket majestically rising up the screen over several seconds, the entire animation flashed before your eyes so fast that it was difficult to see what the animation was.
This sort of busy wait means that in the thread running, the thread is sitting in a computation loop counting down for the number of milliseconds. The result is that the thread does not do anything else other than counting down.
If the operating system is not a preemptive multi-tasking OS, then nothing else will run until the count down completes which may cause problems in other threads and tasks.
If the operating system is preemptive multi-tasking the resulting delays will have a variability as control is switched to some other thread for some period of time before switching back.
This approach is normally used for small pieces of software on dedicated processors where a computation has a known amount of time and where having the processor dedicated to the countdown does not impact other parts of the software. An example might be a small sensor that performs a reading to collect a data sample then does this kind of busy loop before doing the next read to collect the next data sample.
I'm working on an interrupt handler with a hardware design group and we're trying to figure out where a bug is. I'm reading a chip over the SPI bus at 5khz. The chip loads 4 bytes and triggers a data ready pin.
My interrupt handler wakes up and read 4 bytes off the SPI bus and stores the data in a buffer. Strangely enough though, every 17th read gives 4 bytes of all 0's, which is not right. One of the options we're exploring is that the chip isn't always actually ready when it sends the data ready signal.
So, I know I can't sleep in an interrupt handler, but I'd like to try and introduce a delay of 10 or 20 microseconds. Right now I have a for loop which counts to 100,000 then processes the interrupt. I haven't seen any changes, so I thought I might see if someone has a better technique for busy waiting. Or at least a better way of figuring out how many loop iterations I should go through, as I'm not sure how long this takes, or if the compiler is simply optimizing out the whole thing.
I dont know if you have access to any pseudorandom number generation libraries on your embedded device, but doing large number multiplication followed by mod will definately take some cycles. Instead of simply adding 1 (which is very fast at the hardware level and the compiler can optimize it to shifting since you're doing it a static number of times) use a random number seed (does the system have access to a time clock?) if available and do large number multiplication, modulus or factorial operations, negative number division also takes forever. Remember, division takes the longest at the hardware level. Use that to your advantage.
I assume your compiler will strip out a simple loop.
You should use volatile.
volatile unsigned long i;
for (i=0;i< 1000000; i++)
continue;
I assume also that this will not remove the problem or help you.
I can't believe, that a SPI peripheral has such a bug.
But it's possible that you read to slow the data from the SPI-Fifo.
So some of the received data will be dropped.
You should check the error flags of the SPI module and check the RX-empty RX-fullflags of the SPI.
At memory 0x100 and 0x104 are two 32-bit counters. They represent a 64-bit timer and are constantly incrementing.
How do I correctly read from two memory addresses and store the time as a 64-bit integer?
One incorrect solution:
x = High
y = Low
result = x << 32 + y
(The program could be swapped out and in the meantime Low overflows...)
Additional requirements:
Use C only, no assembly
The bus is 32-bit, so no way to read them in one instruction.
Your program may get context switched at any time.
No mutex or locks available.
Some high-level explanation is okay. Code not necessary. Thanks!
I learned this from David L. Mills, who attributes it to Leslie Lamport:
Read the upper half of the timer into H.
Read the lower half of the timer into L.
Read the upper half of the timer again into H'.
If H == H' then return {H, L}, otherwise go back to 1.
Assuming that the timer itself updates atomically then this is guaranteed to work -- if L overflowed somewhere between steps 1 and 2, then H will have incremented between steps 1 and 3, and the test in step 4 will fail.
Given the nature of the memory (a timer), you should be able to read A, read B, read A' and compare A to A', if they match you have your answer. Otherwise repeat.
It sortof depends on what other constraints there are on this memory. If it's something like a system-clock, the above will handle the situation where 0x0000FFFF goes to 0x00010000, and, depending on the order you read it in, you would otherwise erroneously end up with 0x00000000 or 0x0001FFFF.
In addition to what has already been said, you won't get more accurate timing reads than your interrupt / context switch jitter allows. If you fear an interrupt / context switch in the middle of a timer polling, the solution is not to adapt some strange read-read-read-compare algorithm, nor is it to use memory barriers or semaphores.
The solution is to use a hardware interrupt for the timer, with an interrupt service routine that cannot be interrupted when executed. This will give the highest possible accuracy, if you actually have need of such.
The obvious and presumably intended answer is already given by Hobbs and jkerian:
sample High
sample Low
read High again - if it differs from the sample from step 1, return to step 1
On some multi-CPU/core hardware, this doesn't actually work properly. Unless you have a memory barrier to ensure that you're not reading High and Low from your own core's cache, then updates from another core - even if 64-bit atomic and flushed to some shared memory - aren't guaranteed to be visible in your core a timely fashion. While High and Low must be volatile-qualified, this is not sufficient.
The higher the frequency of updates, the more probable and significant the errors due to this issue.
There is no portable way to do this without some C wrappers for OS/CPU-specific memory barriers, mutexes, atomic operations etc..
Brooks' comment below mentions that this does work for certain CPUs, such as modern AMDs.
If you can guarantee that the maximum time of context switch is significantly less than half the low word rollover period, you can use that fact to decide whether the Low value was read before or after its rollover, and choose the correct high word accordingly.
H1=High;L=Low;H2=High;
if (H2!=H1 && L < 0x7FFFFFF) { H1=H2;}
result= H1<<32+L;
This avoids the 'repeat' phase of other solutions.
The problem statement didn't include whether the counters could roll over all 64-bits several times between reads. So I might try alternating reading both 32-bit words a few thousand times, more if needed, store them in 2 vector arrays, run a linear regression fit modulo 2^32 against both vectors, and apply slope matching contraints of that ratio to the possible results, then use the estimated regression fit to predict the count value back to the desired reference time.