Why is total time taken by Google Dataflow more than sum of times taken by individual steps - google-app-engine

I am really unable to understand why is the total elapsed time for a dataflow job so much higher than time taken by individual steps.
For example, total elapsed time for the dataflow in picture is 2 min 39 sec. While time spent in individual steps is just 10 sec. Even if we consider the time spent in setup and destroy phases, there is a difference of 149 secs, which is too much.
Is there some other way of reading the individual stage timing or I am missing something else?
Thanks

According to me 2 min 39 sec time is fine. You are doing this operation reading file and then pardo and then writting it to bigquery.
There are lot of factor involved in this time calculation.
How much data you need to process. i.e - in your case I don't think you are processing much data.
What computation you are doing. i.e your pardo step is only 3 sec so apart from small amount of data pardo do not have much computation as well.
Writing it to bigquery - i.e in your case it is taking only 5 sec.
So creation and destroy phases of the dataflow remains constant. In your case it is 149 sec. Your job is taking only 10 sec that is dependent on all three factor I explained above.
Now let assume that you have to process 2 million record And each record transform take 10 sec. In this case the time will be much higher i.e 10 sec * 2 million records for single node dataflow load job.
So in this case 149 sec didn't stands in-front of whole job completion time as 149 sec is considered for all record process 0 sec * 2 million records.
Hope these information help you to understand the timing.

Related

A performance issue result from "limit 0" in TDengine database

limit 0 is suspected to cause the full table query BUG.
After switching from 2.6.0.32 to 3.0.2.1 today, it was found that the CPU usage of the three nodes (each node uses a CPU with 32 cores) exceeded 90%, while in the original 2.6.0.32 environment, the CPU usage has not yet More than 10%, the figure below is one of the nodes.
View through show queries and found that there are two select * from t XXX limit 0;
In the two environments, the discovery time difference is more than 30,000 times, and the 3.0.2.1 time is as follows:
2.6.0.32 time is as follows:
Next, after changing the statement to select * from t XXX limit 1 in the 3.0.2.1 environment, the time spent dropped from 74 seconds to 0.03 seconds and returned to normal, as shown in the figure below.
Finally, the comparison chart of the two environments is released (after the CPU usage rate of 3.0 is reduced, the query speed has been improved)
In addition, the configuration and table structure of the two environments are the same. In terms of details, the 18ms of the query in 2.6.0.32 is still lower than the 30ms of 3.0.2.1.

GCP Documentation - Task Queue bucket_size and rate

I read a lot of articles and answers here about Google Task, my doubt is "rate" and "bucket_size" behavior.
I read this documentation:
https://cloud.google.com/appengine/docs/standard/java/configyaml/queue
The snippet is:
Configuring the maximum number of concurrent requests
If using the default max_concurrent_requests settings are not
sufficient, you can change the settings for max_concurrent_requests,
as shown in the following example:
If your application queue has a rate of 20/s and a bucket size of 40,
tasks in that queue execute at a rate of 20/s and can burst up to 40/s
briefly. These settings work fine if task latency is relatively low;
however, if latency increases significantly, you'll end up processing
significantly more concurrent tasks. This extra processing load can
consume extra instances and slow down your application.
For example, let's assume that your normal task latency is 0.3
seconds. At this latency, you'll process at most around 40 tasks
simultaneously. But if your task latency increases to 5 seconds, you
could easily have over 100 tasks processing at once. This increase
forces your application to consume more instances to process the extra
tasks, potentially slowing down the entire application and interfering
with user requests.
You can avoid this possibility by setting max_concurrent_requests to a
lower value. For example, if you set max_concurrent_requests to 10,
our example queue maintains about 20 tasks/second when latency is 0.3
seconds. However, when the latency increases over 0.5 seconds, this
setting throttles the processing rate to ensure that no more than 10
tasks run simultaneously.
queue:
# Set the max number of concurrent requests to 50
- name: optimize-queue
rate: 20/s
bucket_size: 40
max_concurrent_requests: 10
I understood that queue works like this:
The bucket is the unit that determine amount of tasks that are execute.
The rate is amount of bucket are fill to execute per period.
max_concurrent_requests is the max simultaneously can be executed.
This snippet here maybe strange:
But if your task latency increases to 5 seconds, you could easily have
over 100 tasks processing at once. This increase forces your
application to consume more instances to process the extra tasks,
potentially slowing down the entire application and interfering with
user requests.
Imagine that max_concurrent_requests is not setted.
For me, it is impossible execute more than 100 tasks because the bucket_size is 40. For me, the low tasks would impact on time that tasks will be wait for a empty bucket.
Why the documentation said that tasks can have over 100?
if the bucket is 40, can more than 40 run simultaneously?
Edit
The bucket is fill up just the all tasks were executed or if some bucket is free in next rate will be increase?
Example:
40 buckets are executing.
1 bucket finished.
Imagine that each bucket spend more than 0.5 seconds and some bucket more than 1s.
When 1 bucket is free, this will fill up in next second or the bucket wait all tasks finishing before bucket fill up again?
Bucket size is defined more precisely in the doc you link, but one way to think of it is as a kind of initial burst limit.
Here's how I understand it would work, based on the parameters you provided in your question:
bucket_size: 40
rate: 20/s
max_concurrent_requests: 10
In the first second (t1) 40 tasks will start processing. At the same time 20 tokens (based on the rate) will be added to the bucket. Thus, at t2, 20 tasks will be primed for processing and another 20 tokens will be added to the bucket.
If there is no max_concurrent_setting, those 20 tasks would start processing. If max_concurrent_setting is 10, nothing will happen because more than 10 processes are already in use.
App Engine will continue to add tokens to the bucket at a rate of 20/s, but only if there is room in the bucket (bucket_size). Once there are 40 tokens in the bucket, it will stop until some of the running processes finish and there is more room.
After the initial burst of 40 tasks is finished, there should never be more than 10 tasks executing at a time.

In Qualtrics, how to customize timer length?

I'm hoping to set up a survey in Qualtrics which will be fixed to last 30 minutes for every participant. This is due to the majority of the survey consisting of audio prompts which are played on a fixed schedule (and using timers to auto-advance to the next audio prompt).
My problem is that there are a few instances in which participants are asked to complete blocks of questions about what they just listened to, and obviously people will differ in the amount of time they take to complete these sections. I was hoping I could somehow track the time (in seconds) a participant spends on these self-report sections, then have a timer page at the end of the self-report, customized to delay participants from advancing but based on how long they took to finish the self-report.
For example, let's say after listening to blocks 1,2, and 3 (which are all timed audio), I want all participants to spend a total of 3 minutes on blocks 4,5, and 6 (which consist of self-report questions) before moving to block 7. If John finishes blocks 4,5, and 6, in 2.5 minutes, I'd then like John to wait for 30 seconds before continuing to 7. If Sally finishes blocks 4,5, and 6 in 2 minutes, I'd like her to wait 60 seconds before continuing.
Hope that makes sense, and greatly appreciate any advice!
The variable ${e://Field/Q_TotalDuration} always contains the current number of seconds since the beginning of the survey.
You can add a javascript to the last question in Block 6 where you pipe in Q_TotalDuration and hide the Next button until you hit the time limit, then show the Next button.

Is there an easy way to get the percentage of successful reads of last x minutes?

I have a setup with a Beaglebone Black which communicates over I²C with his slaves every second and reads data from them. Sometimes the I²C readout fails though, and I want to get statistics about these fails.
I would like to implement an algorithm which displays the percentage of successful communications of the last 5 minutes (up to 24 hours) and updates that value constantly. If I would implement that 'normally' with an array where I store success/no success of every second, that would mean a lot of wasted RAM/CPU load for a minor feature (especially if I would like to see the statistics of the last 24 hours).
Does someone know a good way to do that, or can anyone point me in the right direction?
Why don't you just implement a low-pass filter? For every successfull transfer, you push in a 1, for every failed one a 0; the result is a number between 0 and 1. Assuming that your transfers happen periodically, this works well -- and you just have to adjust the cutoff frequency of that filter to your desired "averaging duration".
However, I can't follow your RAM argument: assuming you store one byte representing success or failure per transfer, which you say happens every second, you end up with 86400B per day -- 85KB/day is really negligible.
EDIT Cutoff frequency is something from signal theory and describes the highest or lowest frequency that passes a low or high pass filter.
Implementing a low-pass filter is trivial; something like (pseudocode):
new_val = 1 //init with no failed transfers
alpha = 0.001
while(true):
old_val=new_val
success=do_transfer_and_return_1_on_success_or_0_on_failure()
new_val = alpha * success + (1-alpha) * old_val
That's a single-tap IIR (infinite impulse response) filter; single tap because there's only one alpha and thus, only one number that is stored as state.
EDIT2: the value of alpha defines the behaviour of this filter.
EDIT3: you can use a filter design tool to give you the right alpha; just set your low pass filter's cutoff frequency to something like 0.5/integrationLengthInSamples, select an order of 0 for the IIR and use an elliptic design method (most tools default to butterworth, but 0 order butterworths don't do a thing).
I'd use scipy and convert the resulting (b,a) tuple (a will be 1, here) to the correct form for this feedback form.
UPDATE In light of the comment by the OP 'determine a trend of which devices are failing' I would recommend the geometric average that Marcus Müller ꕺꕺ put forward.
ACCURATE METHOD
The method below is aimed at obtaining 'well defined' statistics for performance over time that are also useful for 'after the fact' analysis.
Notice that geometric average has a 'look back' over recent messages rather than fixed time period.
Maintain a rolling array of 24*60/5 = 288 'prior success rates' (SR[i] with i=-1, -2,...,-288) each representing a 5 minute interval in the preceding 24 hours.
That will consume about 2.5K if the elements are 64-bit doubles.
To 'effect' constant updating use an Estimated 'Current' Success Rate as follows:
ECSR = (t*S/M+(300-t)*SR[-1])/300
Where S and M are the count of errors and messages in the current (partially complete period. SR[-1] is the previous (now complete) bucket.
t is the number of seconds expired of the current bucket.
NB: When you start up you need to use 300*S/M/t.
In essence the approximation assumes the error rate was steady over the preceding 5 - 10 minutes.
To 'effect' a 24 hour look back you can either 'shuffle' the data down (by copy or memcpy()) at the end of each 5 minute interval or implement a 'circular array by keeping track of the current bucket index'.
NB: For many management/diagnostic purposes intervals of 15 minutes are often entirely adequate. You might want to make the 'grain' configurable.

What's the best way to store elapsed times in a database

I working on a horse racing application and have the need to store elapsed times from races in a table. I will be importing data from a comma delimited file that provides the final time in one format and the interior elapsed times in another. The following is an example:
Final Time: 109.39 (1 minute, 9 seconds and 39/100th seconds)
Quarter Time: 2260 (21 seconds and 60/100th seconds)
Half Time: 4524 (45 seconds and 24/100th seconds)
Three Quarters: 5993 (59 seconds and 93/100th seconds)
I'll want to have the flexibility to easily do things like feet per seconds calculations and to convert elapsed times to splits. I'll also want to be able to easily display the times (elapsed or splits) in fifth of seconds or in hundredths.
Times in fifths: :223 :451 :564 1:091 (note the last digits are superscripts)
Times in hundredths: 22.60 :45.24 :56.93 1:09.39
Thanks in advance for your input.
Generally timespans are either stored as (1) seconds elapsed or (2) start / end datetime. Seconds elapsed can be an integer or a float / double if you require it. You could be creative / crazy and store all times as milliseconds in which case you'd only need an integer.
If you are using PostgreSQL, you can use interval datatype. Otherwise, any integer (int4, int8) or number your database supports is OK. Of course, store values on a single unit of measure: seconds, minutes, milliseconds.
It all depends on how you intend to use it, but number of elapsed seconds (perhaps as a float if necessary) is certainly a favorite.
I think the 109.39 representing 1 min 9.39 sec is pretty silly. Unambiguous, sure, historical tradition maybe, but it's miserable to do computations with that format. (Not impossible, but fixing it during import sounds easy.)
I'd store time in a decimal format of some sort -- either an integer representing hundredths-of-a-second, as all your other times are displayed, or a data-base specific decimal-aware format.
Standard floating point representations might eventually lead you to wonder why a horse that ran two laps in 20.1 seconds each took 40.200035 seconds to run both laps combined.

Resources