A performance issue result from "limit 0" in TDengine database - database

limit 0 is suspected to cause the full table query BUG.
After switching from 2.6.0.32 to 3.0.2.1 today, it was found that the CPU usage of the three nodes (each node uses a CPU with 32 cores) exceeded 90%, while in the original 2.6.0.32 environment, the CPU usage has not yet More than 10%, the figure below is one of the nodes.
View through show queries and found that there are two select * from t XXX limit 0;
In the two environments, the discovery time difference is more than 30,000 times, and the 3.0.2.1 time is as follows:
2.6.0.32 time is as follows:
Next, after changing the statement to select * from t XXX limit 1 in the 3.0.2.1 environment, the time spent dropped from 74 seconds to 0.03 seconds and returned to normal, as shown in the figure below.
Finally, the comparison chart of the two environments is released (after the CPU usage rate of 3.0 is reduced, the query speed has been improved)
In addition, the configuration and table structure of the two environments are the same. In terms of details, the 18ms of the query in 2.6.0.32 is still lower than the 30ms of 3.0.2.1.

Related

Flink Dashboard Throughput doesn't add up

I have two operators, a source and a map. The incoming throughput of of the map is stuck at just above 6K messages/s whereas the message count reaches the size of the whole stream (~ 350K) in under 20s (see duration). 350000/20 means that I have a throughput of at least 17500 and not 6000 as flink suggests! What's going on here?
as shown in the picture:
start time = 13:10:29
all messages are already read by = 13:10:46 (less than 20s)
I checked the flink library code and it seems that the numRecordsOutPerSecond statistic (as well as the rest similar ones) operate on a window. This means that they display average throughput but of the last X seconds. It's not the average throughput of the whole execution

Getting JMeter to work with Throughput Shaping timer and Concurrency Thread Group

I am trying to shape a JMeter test involving a Concurrency Thread Group and a Throughput Shaping Timer as documented here and here. the timer is configured to run ten ramps and stages with RPS from 1 to 333.
I want to set up the Concurrency Thread Group to use the schedule feedback function and added the formula in the Target concurrency field (I have updated the example from tst-name to the actual timer name). ramp-up time and steps I have set to 1 as I assume the properties are not that important if the throughput is managed by the timer; the Hold Target Rate time is 8000, which is longer than the steps added in the timer (6200).
When I run the test, it ends without any exceptions within 3 seconds or so. The log file shows a few rows about starting and ending threads but nothing alarming.
The only thing I find suspicious is the Log entry "VirtualUserController: Test limit reached, thread is done plus thread name.
I am not getting enough clues from the documentation linked here to figure this out myself, do you have any hints?
According to the documentation rampup time and steps should be blank:
When using this approach, leave Concurrency Thread Group Ramp Up Time and Ramp-Up Steps Count fields blank"
So your assumption that setting them to 1 is OK, seems false...

Why is total time taken by Google Dataflow more than sum of times taken by individual steps

I am really unable to understand why is the total elapsed time for a dataflow job so much higher than time taken by individual steps.
For example, total elapsed time for the dataflow in picture is 2 min 39 sec. While time spent in individual steps is just 10 sec. Even if we consider the time spent in setup and destroy phases, there is a difference of 149 secs, which is too much.
Is there some other way of reading the individual stage timing or I am missing something else?
Thanks
According to me 2 min 39 sec time is fine. You are doing this operation reading file and then pardo and then writting it to bigquery.
There are lot of factor involved in this time calculation.
How much data you need to process. i.e - in your case I don't think you are processing much data.
What computation you are doing. i.e your pardo step is only 3 sec so apart from small amount of data pardo do not have much computation as well.
Writing it to bigquery - i.e in your case it is taking only 5 sec.
So creation and destroy phases of the dataflow remains constant. In your case it is 149 sec. Your job is taking only 10 sec that is dependent on all three factor I explained above.
Now let assume that you have to process 2 million record And each record transform take 10 sec. In this case the time will be much higher i.e 10 sec * 2 million records for single node dataflow load job.
So in this case 149 sec didn't stands in-front of whole job completion time as 149 sec is considered for all record process 0 sec * 2 million records.
Hope these information help you to understand the timing.

How should i evaluate the insert benchmark from CrateDB?

I am trying to understand and interpret the benchmark which is provided from CrateDB. (https://staging.crate.io/benchmark/)
I am interested on how many elements can be inserted during one second.
I know that this may vary on the size of the tuples. And I would define that I have the same ements-sizes as CrateDB uses in their exmpale.
They provide an exmaple for bulk-insertion and there it takes on average 50 milliseconds to insert a bulk of 10.000 (integer/string pairs).
Now, can I calculate that it is possible to insert 20 bulks of 10.000 pairs during 1s (1000 milliseconds)?
1000ms/50ms = 20 -> 20*10000 = 200000 -> 200000 integer/string pairs per second
Can I say how the result would differ, if i have 7 integers and 2 decimals(7,4)?
Well this: https://staging.crate.io/benchmark/ is only comparable to itself, so it will show if code changes/features made CrateDB slower/faster. It's not a reliable source for actual benchmarking and won't give you comparable numbers (among other things because the setup is vanilla).
As for your question, I recommend running your own benchmarks to satisfy whatever requirements you have. 😬
The tool we used for these benchmarks is cr8 from one of our core devs!
Cheers, Claus

Is there an easy way to get the percentage of successful reads of last x minutes?

I have a setup with a Beaglebone Black which communicates over I²C with his slaves every second and reads data from them. Sometimes the I²C readout fails though, and I want to get statistics about these fails.
I would like to implement an algorithm which displays the percentage of successful communications of the last 5 minutes (up to 24 hours) and updates that value constantly. If I would implement that 'normally' with an array where I store success/no success of every second, that would mean a lot of wasted RAM/CPU load for a minor feature (especially if I would like to see the statistics of the last 24 hours).
Does someone know a good way to do that, or can anyone point me in the right direction?
Why don't you just implement a low-pass filter? For every successfull transfer, you push in a 1, for every failed one a 0; the result is a number between 0 and 1. Assuming that your transfers happen periodically, this works well -- and you just have to adjust the cutoff frequency of that filter to your desired "averaging duration".
However, I can't follow your RAM argument: assuming you store one byte representing success or failure per transfer, which you say happens every second, you end up with 86400B per day -- 85KB/day is really negligible.
EDIT Cutoff frequency is something from signal theory and describes the highest or lowest frequency that passes a low or high pass filter.
Implementing a low-pass filter is trivial; something like (pseudocode):
new_val = 1 //init with no failed transfers
alpha = 0.001
while(true):
old_val=new_val
success=do_transfer_and_return_1_on_success_or_0_on_failure()
new_val = alpha * success + (1-alpha) * old_val
That's a single-tap IIR (infinite impulse response) filter; single tap because there's only one alpha and thus, only one number that is stored as state.
EDIT2: the value of alpha defines the behaviour of this filter.
EDIT3: you can use a filter design tool to give you the right alpha; just set your low pass filter's cutoff frequency to something like 0.5/integrationLengthInSamples, select an order of 0 for the IIR and use an elliptic design method (most tools default to butterworth, but 0 order butterworths don't do a thing).
I'd use scipy and convert the resulting (b,a) tuple (a will be 1, here) to the correct form for this feedback form.
UPDATE In light of the comment by the OP 'determine a trend of which devices are failing' I would recommend the geometric average that Marcus Müller ꕺꕺ put forward.
ACCURATE METHOD
The method below is aimed at obtaining 'well defined' statistics for performance over time that are also useful for 'after the fact' analysis.
Notice that geometric average has a 'look back' over recent messages rather than fixed time period.
Maintain a rolling array of 24*60/5 = 288 'prior success rates' (SR[i] with i=-1, -2,...,-288) each representing a 5 minute interval in the preceding 24 hours.
That will consume about 2.5K if the elements are 64-bit doubles.
To 'effect' constant updating use an Estimated 'Current' Success Rate as follows:
ECSR = (t*S/M+(300-t)*SR[-1])/300
Where S and M are the count of errors and messages in the current (partially complete period. SR[-1] is the previous (now complete) bucket.
t is the number of seconds expired of the current bucket.
NB: When you start up you need to use 300*S/M/t.
In essence the approximation assumes the error rate was steady over the preceding 5 - 10 minutes.
To 'effect' a 24 hour look back you can either 'shuffle' the data down (by copy or memcpy()) at the end of each 5 minute interval or implement a 'circular array by keeping track of the current bucket index'.
NB: For many management/diagnostic purposes intervals of 15 minutes are often entirely adequate. You might want to make the 'grain' configurable.

Resources