How is Average Seek Time Calculated? - file

A hard disk system has the following parameters :
Number of tracks = 500
Number of sectors/track = 100
Number of bytes /sector = 500
Time taken by the head to move from one track to adjacent track = 1 ms
Rotation speed = 600 rpm.
What is the average time taken for transferring 250 bytes from the disk ?
Well I wanted to know How the average seek time is calculated ?
My Approach
Avg. time to transfer = Avg. seek time + Avg. rotational delay + Data transfer time
Avg Seek Time
given that : time to move between successive tracks is 1 ms
time to move from track 1 to track 1 : 0ms
time to move from track 1 to track 2 : 1ms
time to move from track 1 to track 3 : 2ms
..
..
time to move from track 1 to track 500 : 499 ms
Avg Seek time =
= 249.5 ms
But After Reading Answer given here Why is average disk seek time one-third of the full seek time?
Im confused with my approach.
My question is
Is my Approach Correct ?
If not Please explain the correct way to calculate Average seek time
If Yes please explain wh we are not considering average for every possible pair of tracks (as mentioned in the above link)?

There are a lot more than 500 possible seek times. Your method only accounts for seeks starting at track 1.
What about seeks starting from track 2? Or from track 285?
I wouldn't say your approach is wrong, but it's certainly incomplete.

As is pointed out in the link you're reffering to in this question the average time is calculated as average distance from ANY track to ANY track. So you have to add all of the Subsums to the one you are using to calculate average seek time and then divide this sum by the number of tracks. It sums out to: N/3, where N is the distance between track 0 and last.
f.eg. average distance from track 249 to ANY other track is:middle average sum

Your calculation is the average track seek, you need to add the sector seek to that.
When seeking for a read operation, the head is positioned on (a) a track, at a given (b) sector.
The (average) seek time is the time taken to switch to that position to any other position, with both (a) track and (b) sector.
When positioned, the read can start.
The disk RPM is into play for this, if it spins at 600rpm and has 100 sectors per track, it means that it seeks sectors at
60000ms (because rpm = per minute)
/
600rpm (disk spin speed)
/
100sectors (per track)
=
1ms (to change from a sector to the next adjacent one)
Normally, you would have to consider that as you change tracks, the disk is still spinning and thus account for the sector offset change. But since we are interested only in the average, this cancels out (hopefully).
So, to your 249.5 ms for the track seek average time, you need to add :
same formula :
sum 0->100/100 * 1ms (sector seek speed) = 50.5ms
Thus, the average seek speed for both track and sector is 300ms.

Related

Is there possible way to get data sampling in continous data batches?

We've a data stream with continously dumps data in our data lake. Is there a good solution with min running to get 10% random data samples from the data?
I'm currently using code(snipped below) but this will outgrow 10% total sampling as new batches will arrive. I've also tried to calculate 10 batches of 100 records each with (.1) mean but it resulted in ~32% sampling.
select id,
(uniform(0::float, 1::float, random(1)) < .10)::boolean as sampling
from temp_hh_mstr;
Prior to it, I thought to get sampling via snowflake's TABLESAMPLE by substracting from the total count and current IDs in sampling from the table. It takes calculations for every time and any batch arrives which will increase the cost.
Some additional referece I've been thinking towards -
Wilson Score Interval With Continuity Correction
Binomial Confidence Interval

Is there an easy way to get the percentage of successful reads of last x minutes?

I have a setup with a Beaglebone Black which communicates over I²C with his slaves every second and reads data from them. Sometimes the I²C readout fails though, and I want to get statistics about these fails.
I would like to implement an algorithm which displays the percentage of successful communications of the last 5 minutes (up to 24 hours) and updates that value constantly. If I would implement that 'normally' with an array where I store success/no success of every second, that would mean a lot of wasted RAM/CPU load for a minor feature (especially if I would like to see the statistics of the last 24 hours).
Does someone know a good way to do that, or can anyone point me in the right direction?
Why don't you just implement a low-pass filter? For every successfull transfer, you push in a 1, for every failed one a 0; the result is a number between 0 and 1. Assuming that your transfers happen periodically, this works well -- and you just have to adjust the cutoff frequency of that filter to your desired "averaging duration".
However, I can't follow your RAM argument: assuming you store one byte representing success or failure per transfer, which you say happens every second, you end up with 86400B per day -- 85KB/day is really negligible.
EDIT Cutoff frequency is something from signal theory and describes the highest or lowest frequency that passes a low or high pass filter.
Implementing a low-pass filter is trivial; something like (pseudocode):
new_val = 1 //init with no failed transfers
alpha = 0.001
while(true):
old_val=new_val
success=do_transfer_and_return_1_on_success_or_0_on_failure()
new_val = alpha * success + (1-alpha) * old_val
That's a single-tap IIR (infinite impulse response) filter; single tap because there's only one alpha and thus, only one number that is stored as state.
EDIT2: the value of alpha defines the behaviour of this filter.
EDIT3: you can use a filter design tool to give you the right alpha; just set your low pass filter's cutoff frequency to something like 0.5/integrationLengthInSamples, select an order of 0 for the IIR and use an elliptic design method (most tools default to butterworth, but 0 order butterworths don't do a thing).
I'd use scipy and convert the resulting (b,a) tuple (a will be 1, here) to the correct form for this feedback form.
UPDATE In light of the comment by the OP 'determine a trend of which devices are failing' I would recommend the geometric average that Marcus Müller ꕺꕺ put forward.
ACCURATE METHOD
The method below is aimed at obtaining 'well defined' statistics for performance over time that are also useful for 'after the fact' analysis.
Notice that geometric average has a 'look back' over recent messages rather than fixed time period.
Maintain a rolling array of 24*60/5 = 288 'prior success rates' (SR[i] with i=-1, -2,...,-288) each representing a 5 minute interval in the preceding 24 hours.
That will consume about 2.5K if the elements are 64-bit doubles.
To 'effect' constant updating use an Estimated 'Current' Success Rate as follows:
ECSR = (t*S/M+(300-t)*SR[-1])/300
Where S and M are the count of errors and messages in the current (partially complete period. SR[-1] is the previous (now complete) bucket.
t is the number of seconds expired of the current bucket.
NB: When you start up you need to use 300*S/M/t.
In essence the approximation assumes the error rate was steady over the preceding 5 - 10 minutes.
To 'effect' a 24 hour look back you can either 'shuffle' the data down (by copy or memcpy()) at the end of each 5 minute interval or implement a 'circular array by keeping track of the current bucket index'.
NB: For many management/diagnostic purposes intervals of 15 minutes are often entirely adequate. You might want to make the 'grain' configurable.

How do I use a PID controller?

I'm currently working on a temperature controller.
I have a Temperature_PID() function that returns the manipulated variable (which is the sum of the P, I, and D terms) but what do I do with this output?
The temperature is controlled by PWM, so 0% duty cycle = heater off and 100% duty cycle = heater on.
So far I tried
Duty_Cycle += Temperature_PID();
if(Duty_Cycle > 100) Duty_Cycle = 100;
else if(Duty_Cycle < 0) Duty_Cycle = 0;
This didn't work for me because the I term is basically makes this system very unstable. Imagine integrating an area, adding another small data point, and integrating the area again, and summing them. Over and over. That means each data point makes this control scheme exponentially worse.
The other thing I would like to try is
Duty_Cycle = Expected_Duty_Cycle + Temperature_PID();
where Expected_Duty_Cycle is what the temperature should be set to once the controller reaches a stable point and Temperature_PID() is 0. However, this also doesn't work because the Expected_Duty_Cycle would always be changing depending on the conditions of the heater, e.g. different weather.
So my question is what exactly do I do with the output of PID? I don't understand how to assign a duty cycle based on the PID output. Ideally this will stay at 100% duty cycle until the temperature almost reaches the set point and start dropping off to a lower duty cycle. But using my first method (with my I gain set to zero) it only starts lowering the duty cycle after it already overshoots.
This is my first post. Hope I find my answer. Thank you stackoverflow.
EDIT:
Here's my PID function.
double TempCtrl_PID(PID_Data *pid)
{
Thermo_Data tc;
double error, pTerm, iTerm, dTerm;
Thermo_Read(CHIP_TC1, &tc);
pid->last_pv = pid->pv;
pid->pv = Thermo_Temperature(&tc);
error = pid->sp - pid->pv;
if(error/pid->sp < 0.1)
pid->err_sum += error;
pTerm = pid->kp * error;
iTerm = pid->ki * pid->err_sum;
dTerm = pid->kd * (pid->last_pv - pid->pv);
return pTerm + iTerm + dTerm;
}
EDIT 2:
Never used this before so let me know if the link is broken.
https://picasaweb.google.com/113881440334423462633/January302013
Sorry, Excel is crashing on me when I try to rename axes or the title. Note: there isn't a fan in the system yet so I can't cool the heater as fast as I can get it to heat up, so it spends very little time below the set point compared to above.
The first picture is a simple on-off controller.
The second picture is my PD controller. As you can see, it takes a lot longer for the temperature to decrease because it doesn't subtract before the temperature overshoots, it waits until the temperature overshoots before subtracting from the duty cycle, and does so too slowly. How exactly do I tell my controller to lower the duty cycle before it hits the max temperature?
The output of the PID is the duty cycle. You must adjust kp, ki, and kd to put the PID output in the range of the Duty_Cycle, e.g., 0 to 100. It is usually a good idea to explicitly limit the output in the PID function itself.
You should "tune" your PID in simple steps.
Turn off the integral and derivative terms (set ki and kd to zero)
Slowly increase your kp until a 10% step change in the setpoint makes the output oscillate
Reduce kp by 30% or so, which should eliminate the oscillations
Set ki to a fraction of kp and adjust to get your desired tradeoff of overshoot versus time to reach setpoint
Hopefully, you will not need kd, but if you do, make it smaller still
Your PID controller output should be setting the value of the duty cycle directly.
Basically you are going to be controlling the heater settings based on the difference in the actual temperature versus the temperature setpoint.
You will need to adjust the values of the PID parameters to obtain the performance you are looking for.
First, set I and D to zero and put in a value for P, say 2 to start.
Change the setpoint and see what your response is. Increase P and make another setpoint change and see what happens. Eventually you will see the temperature oscillate consistently and never come to any stable value. This value is known as the "ulitmate gain". Pay attention to the frequency of the oscillation as well. Set P equal to half of the ultimate gain.
Start with a value of 1.2(ultimate gain)/(Oscillation Frequency) for I and change the setpoint. Adjust the values of P and I from those values to get to where you want to go, tracking the process and seeing if increasing or decreasing values improves things.
Once you have P and I you can work on D but depending on the process dynamics giving a value for D might make your life worse.
The Ziegler-Nichols method gives you some guidelines for PID values which should get you in the ballpark. From there you can make adjustments to get better performance.
You will have to weigh the options of having overshoot with the amount of time the temperature takes to reach the new setpoint. The faster the temperature adjusts the more overshoot you will have. To have no overshoot will increase that amount of time considerably.
A few suggestions:
You seem to be integrating twice. Once inside your TempCtrl_PID function and once outside. Duty_Cycle += . So now your P term is really I.
Start with only a P term and keep increasing it until the system becomes unstable. Then back off (e.g. use 1/2 to 1/4 the value where it becomes unstable) and start adding an I term. Start with very low values on the I term and then gradually increase. This process is a way of tuning the loop. Because the system will probably have a pretty long time constant this may be time consuming...
You can add some feed-forward as you suggest (expected duty cycle for a given setpoint - map it out by setting the duty cycle and letting the system stabilize.). It doesn't matter if that term isn't perfect since the loop will take out the remaining error. You can also simply add some constant bias to the duty cycle. Keep in mind a constant wouldn't really make any difference as the integrator will take it out. It will only affect a cold start.
Make sure you have some sort of fixed time base for this loop. E.g. make an adjustment every 10ms.
I would not worry about the D term for now. A PI controller should be good enough for most applications.

how do I figure out provisional throughput for AWS DynamoDB table?

My system is supposed to write a large amount of data into a DynamoDB table every day. These writes come in bursts, i.e. at certain times each day several different processes have to dump their output data into the same table. Speed of writing is not critical as long as all the daily data gets written before the next dump occurs. I need to figure out the right way of calculating the provisional capacity for my table.
So for simplicity let's assume that I have only one process writing data once a day and it has to write upto X items into the table (each item < 1KB). Is the capacity I would have to specify essentially equal to X / 24 / 3600 writes/second?
Thx
The provisioned capacity is in terms of writes/second. You need to make sure that you can handle the PEAK number of writes/second that you are going to expect, not the average over the day. So, if you have a single process that runs once a day and makes X number of writes, of Y size (in KB, rounded up), over Z number of seconds, your formula would be
capacity = (X * Y) / Z
So, say you had 100K writes over 100 seconds and each write < 1KB, you would need 1000 w/s capacity.
Note that in order to minimize provisioned write capacity needs, it is best to add data into the system on a more continuous basis, so as to reduce peaks in necessary read/write capacity.

B+ Tree CPU search time

I was just wondering how would you calculate the worst case time for an unclustered and clustered b+ tree?
For example, say I had 1,000,000 records (1 row = 100 bytes), disk pages are 4000 bytes, a key was 20 bytes, and the access time of a page is 40ms. How would I calculate using these variables the unclustered and clustered b+ tree worse case time?
I know that to calculate the height/levels of a b+ tree you use the following (i think):
logF(keys)
where F = number of praches branches.
With the height, you can use it to calculate the final worst case time, but I don't know how to do that... I've tried searching around but all I could fine was times for average cases or examples that weren't very clear.
Any help is appreciated!
I would say logF(keys) its the worst case to finding the page, but after that the worst case would be an unclustured index with all your rids pointing to different pages, wich mean
logF(keys) + N being N the number of rids in the index node.
so in the end it would be
H= height of the tree wich would be like 3 or 4.
H + N = 4 + (4000/20) = 204 I/Os
wich let say they are in memory and want to see the CPU time then it would be
CPU = 204*0.04 = 8.16 secs. althought 40ms for moving a page in memory its quite a lot of time i think (for disk reading could make sense) but i think the calculations are fine.

Resources