This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
Suppose that a fast-food restaurant sells salad and burger. There are
two cashiers. With cashier 1, the number of seconds that it takes to
complete an order of salad is uniformly distributed in
{55,56,...,64,65}; and the number of seconds it takes to complete an
order of burger is uniformly distributed in {111,112,...,,129,130}.
With cashier 2, the number of seconds that it takes to complete an
order of salad is uniformly distributed in {65,66,...,74,75}; and the
number of seconds it takes to complete an order of burger is uniformly
distributed in {121,122,...,,139,140}. Assume that the customers
arrive at random times but has an average arrival rate of r customers
per minute.
Consider two different scenarios.
• Customers wait in one
line for service and, when either of two cashiers is available, the
first customer in the line goes to the cashier and gets serviced. In
this scenario, when a customer arrives at the restaurant, he either
gets serviced if there is no line up, or waits at the end of the line.
• Customers wait in two lines, each for a cashier. The first customer
in a line will get serviced if and only if the cashier for his line
becomes available. In this scenario, when a customer arrives at the
restaurant, he joins the shorter line. In addition, we impose the
condition that if a customer joins a line, he will not move to the
other line or to the other cashier when the other line becomes shorter
or when the other cashier becomes free.
In both scenarios considered,
a cashier will only start serving the next customer when the customer
he is currently serving has received his ordered food. (That is the
point we call “the customer’s order is completed”.)
... Simulation
For
each of the two scenarios and for several choices of r (see later
description), you are to simulate the customers
arriving/waiting/getting service over a period of 3 hours, namely,
from time 0 to time 180 minutes, where you assume that at time 0 there
is no customer waiting and both cashiers are available; The entire
period of 3 hours is to be divided into time slots each of 1 second
duration. At each time slot, with r/60 probability, you make one new
customer arrive, and with 1 − r/60 probability you make no new
customer arrive. This should give rise to an average customer arrival
rate of r customers/minute, and the arrival model will be reasonably
close to what is described above. In each time slot, you will make
your program handle whatever necessary.
... Objectives and
Deliverables
You need to write a program to investigate the following.
For each of the two scenarios and for each r, you are to divide the
three-hour simulated period into 10-minute periods, and for every
customer arriving during period i (i ∈ {1,2,...,18}), compute the
overall waiting time of the customer (namely, from the time he arrives
at the restaurant to the time when his order is completed. You need to
print for each i the average waiting time for the customers arriving
during period i. Note that if a customer arriving in period i has not
been served within the three-hour simulated period, then his waiting
time is not known. So the average waiting time for customers arriving
in this period cannot be computed. In that case, simply print “not
available” as the average waiting time for that period.
So, this program deals with hours, minutes, and seconds.
Would it be best to make a three-dimensional array as such:
time[3][60][60]
A total of three hours, with 60 minutes within, with 60 seconds within.
Alternatively, I was thinking that I should make a "for-loop" with this structure:
for (time=0;t<10800;t++)
Every iteration of this loop will represent one second of the three hour simulation (3hx60mx60s=10800 seconds).
Am I on the right track here guys? Which method is more plausible. Are there other arrays that are critical for this program?
Help is appreciated, as always!
It's almost always best to have your internal representation of time be in seconds; you'll have a much easier time working with your for loop than with a three-dimensional array. One nice convention is to write it as
MAX_SECONDS = 3 * 60 * 60
for (t=0;t<MAX_SECONDS;t++)
The data structure to look into for this project is, appropriately enough, a queue. This can be implemented using arrays, but will require some extra work.
Related
Given the variable 'points' which increases every time a variable 'player' collects a point, how do I logically find a way to reward user for finding 30 points inside a 5 minutes limit? There's no countdown timer.
e.g player may have 4 points but in 5 minutes if he has 34 points that also counts.
I was thinking about using timestamps but I don't really know how to do that.
What you are talking about is a "sliding window". Your window is time based. Record each point's timestamp and slide your window over these timestamps. You will need to pick a time increment to slide your window.
Upon each "slide", count your points. When you get the amount you need, "reward your user". The "upon each slide" means you need some sort of timer that calls a function each time to evaluate the result and do what you want.
For example, set a window for 5 minutes and a slide of 1 second. Don't keep a single variable called points. Instead, simply create an array of timestamps. Every timer tick (of 1 second in this case), count the number of timestamps that match t - 5 minutes to t now; if there are 30 or more, you've met your threshold and can reward your super-fast user. If you need the actual value, that may be 34, well, you've just computed it, so you can use it.
There may be ways to optimize this. I've provided the naive approach. Timestamps that have gone out of range can be deleted to save space.
If there are "points going into the window" that count, then just add them to the sum.
I'm writing a cron-like job-dispatcher that runs jobs every minute (and other jobs every 5 minutes, etc.). However, instead of immediately dispatching all the jobs that run on a particular period at the top-of-the-minute, I want to spawn them evenly over their periods.
For example, if I have N jobs that run every P minutes, rather than spawn them all at P:00, I want to spawn the jobs evenly over the P*60 seconds, i.e., ceil(N/P*60) jobs per second. Hence, the spawn time for each job would be "skewed" somewhat later.
However, for each job J, I want J to be spawned at the same skew every time it's dispatched so that the time between spawns for J is constant (and matches its period).
Each job has various information associated with it, including several strings that vary for each job. My original thought was to calculate a hash-code, H, for one or more of the strings and mod it by P*60 to calculate a constant skew, S, for each job. As long as the strings associated with the job remain the same, the calculated skew would remain constant.
However, I would assume that S=H%(P*60) suffers from to problems similar to using rand() (uneven distribution that's biased towards lower numbers). However, I don't think the solutions presented there (to call rand() multiple times) would apply to my case where I'm using a hash-code because the hash function for a given job would always return the same hash.
So how can I get what I want? (I'm writing in C.)
Examples:
Suppose I have N every-minute jobs (a cron schedule of * * * * *). For N < 60 (let's say 2), then job-1 might be skewed to start at :23 (23 seconds past the minute) and job-2 might be skewed to start at :37. With so few jobs, it may not seem evenly distributed. However, as N approached 60, the "gaps" would fill in (assuming a perfect skew function) so that one job would be spawned every second. If N passed 60, some jobs spawned at some seconds would "double up." Similarly, as N approached 120, the "gaps" would again fill in so that two jobs would be spawned every second. And so on.
Supppose I have N every-five-minute jobs (a cron schedule of */5 * * * *). In "normal" cron, that means "every five minutes on the zeroth second on the fives." I instead want that to mean "every five minutes, but not necessarily (and most likely not) on the zeroth second of some minute, but the only guarantee is that the interval between spawns will be five minutes." So for example, a particular job might be spawned at 00:07:24, 00:12:24, 00:17:24, etc. As N approached 300, one job would be spawned per second.
I'm hoping to set up a survey in Qualtrics which will be fixed to last 30 minutes for every participant. This is due to the majority of the survey consisting of audio prompts which are played on a fixed schedule (and using timers to auto-advance to the next audio prompt).
My problem is that there are a few instances in which participants are asked to complete blocks of questions about what they just listened to, and obviously people will differ in the amount of time they take to complete these sections. I was hoping I could somehow track the time (in seconds) a participant spends on these self-report sections, then have a timer page at the end of the self-report, customized to delay participants from advancing but based on how long they took to finish the self-report.
For example, let's say after listening to blocks 1,2, and 3 (which are all timed audio), I want all participants to spend a total of 3 minutes on blocks 4,5, and 6 (which consist of self-report questions) before moving to block 7. If John finishes blocks 4,5, and 6, in 2.5 minutes, I'd then like John to wait for 30 seconds before continuing to 7. If Sally finishes blocks 4,5, and 6 in 2 minutes, I'd like her to wait 60 seconds before continuing.
Hope that makes sense, and greatly appreciate any advice!
The variable ${e://Field/Q_TotalDuration} always contains the current number of seconds since the beginning of the survey.
You can add a javascript to the last question in Block 6 where you pipe in Q_TotalDuration and hide the Next button until you hit the time limit, then show the Next button.
I have a setup with a Beaglebone Black which communicates over I²C with his slaves every second and reads data from them. Sometimes the I²C readout fails though, and I want to get statistics about these fails.
I would like to implement an algorithm which displays the percentage of successful communications of the last 5 minutes (up to 24 hours) and updates that value constantly. If I would implement that 'normally' with an array where I store success/no success of every second, that would mean a lot of wasted RAM/CPU load for a minor feature (especially if I would like to see the statistics of the last 24 hours).
Does someone know a good way to do that, or can anyone point me in the right direction?
Why don't you just implement a low-pass filter? For every successfull transfer, you push in a 1, for every failed one a 0; the result is a number between 0 and 1. Assuming that your transfers happen periodically, this works well -- and you just have to adjust the cutoff frequency of that filter to your desired "averaging duration".
However, I can't follow your RAM argument: assuming you store one byte representing success or failure per transfer, which you say happens every second, you end up with 86400B per day -- 85KB/day is really negligible.
EDIT Cutoff frequency is something from signal theory and describes the highest or lowest frequency that passes a low or high pass filter.
Implementing a low-pass filter is trivial; something like (pseudocode):
new_val = 1 //init with no failed transfers
alpha = 0.001
while(true):
old_val=new_val
success=do_transfer_and_return_1_on_success_or_0_on_failure()
new_val = alpha * success + (1-alpha) * old_val
That's a single-tap IIR (infinite impulse response) filter; single tap because there's only one alpha and thus, only one number that is stored as state.
EDIT2: the value of alpha defines the behaviour of this filter.
EDIT3: you can use a filter design tool to give you the right alpha; just set your low pass filter's cutoff frequency to something like 0.5/integrationLengthInSamples, select an order of 0 for the IIR and use an elliptic design method (most tools default to butterworth, but 0 order butterworths don't do a thing).
I'd use scipy and convert the resulting (b,a) tuple (a will be 1, here) to the correct form for this feedback form.
UPDATE In light of the comment by the OP 'determine a trend of which devices are failing' I would recommend the geometric average that Marcus Müller ꕺꕺ put forward.
ACCURATE METHOD
The method below is aimed at obtaining 'well defined' statistics for performance over time that are also useful for 'after the fact' analysis.
Notice that geometric average has a 'look back' over recent messages rather than fixed time period.
Maintain a rolling array of 24*60/5 = 288 'prior success rates' (SR[i] with i=-1, -2,...,-288) each representing a 5 minute interval in the preceding 24 hours.
That will consume about 2.5K if the elements are 64-bit doubles.
To 'effect' constant updating use an Estimated 'Current' Success Rate as follows:
ECSR = (t*S/M+(300-t)*SR[-1])/300
Where S and M are the count of errors and messages in the current (partially complete period. SR[-1] is the previous (now complete) bucket.
t is the number of seconds expired of the current bucket.
NB: When you start up you need to use 300*S/M/t.
In essence the approximation assumes the error rate was steady over the preceding 5 - 10 minutes.
To 'effect' a 24 hour look back you can either 'shuffle' the data down (by copy or memcpy()) at the end of each 5 minute interval or implement a 'circular array by keeping track of the current bucket index'.
NB: For many management/diagnostic purposes intervals of 15 minutes are often entirely adequate. You might want to make the 'grain' configurable.
I am writing an application which is recording some 'basic' stats -- page views, and unique visitors. I don't like the idea of storing every single view, so have thought about storing totals with a hour/day resolution. For example, like this:
Tuesday 500 views 200 unique visitors
Wednesday 400 views 210 unique visitors
Thursday 800 views 420 unique visitors
Now, I want to be able to query this data set on chosen time periods -- ie, for a week. Calculating views is easy enough: just addition. However, adding unique visitors will not give the correct answer, since a visitor may have visited on multiple days.
So my question is how do I determine or estimate unique visitors for any time period without storing each individual hit. Is this even possible? Google Analytics reports these values -- surely they don't store every single hit and query the data set for every time period!?
I can't seem to find any useful information on the net about this. My initial instinct is that I would need to store 2 sets of values with different resolutions (ie day and half-day), and somehow interpolate these for all possible time ranges. I've been playing with the maths, but can't get anything to work. Do you think I may be on to something, or on the wrong track?
Thanks,
Brendon.
If you are OK with approximations, I think tom10 is onto something, but his notion of random subsample is not the right one or needs a clarification. If I have a visitor that comes on day1 and day2, but is sampled only on day2, that is going to introduce a bias in the estimation. What I would do is to store full information for a random subsample of users (let's say, all users whose hash(id)%100 == 1). Then you do the full calculations on the sampled data and multiply by 100. Yes tom10 said about just that, but there are two differences: he said "for example" sample based on the ID and I say that's the only way you should sample because you are interested in unique visitors. If you were interested in unique IPs or unique ZIP codes or whatever you would sample accordingly. The quality of the estimation can be assessed using the normal approximation to the binomial if your sample is big enough. Beyond this, you can try and use a model of user loyalty, like you observe that over 2 days 10% of visitors visit on both days, over three days 11% of visitors visit twice and 5% visit once and so forth up to a maximum number of day. These numbers unfortunately can depend on time of the week, season and even modeling those, loyalty changes over time as the user base matures, changes in composition and the service changes as well, so any model needs to be re-estimated. My guess is that in 99% of practical situations you'd be better served by the sampling technique.
You could store a random subsample of the data, for example, 10% of the visitor IDs, then compare these between days.
The easiest way to do this is to store a random subsample of each day for future comparisons, but then, for the current day, temporarily store all your IDs and compare them to the subsampled historical data and determine the fraction of repeats. (That is, you're comparing the subsampled data to a full dataset for a given day and not comparing two subsamples -- it's possible to compare two subsamples and get an estimate for the total but the math would be a bit trickier.)
You don't need to store every single view, just each unique session ID per hour or day depending on the resolution you need in your stats.
You can keep these log files containing session IDs sorted to count unique visitors quickly, by merging multiple hours/days. One file per hour/day, one unique session ID per line.
In *nix, a simple one-liner like this one will do the job:
$ sort -m sorted_sid_logs/2010-09-0[123]-??.log | uniq | wc -l
It counts the number of unique visitors during the first three days of September.
You can calculate the uniqueness factor (UF) on each day and use it to calculate the composite (week by example) UF.
Let's say that you counted:
100 visits and 75 unique session id's on monday (you have to store the sessions ID's at least for a day, or the period you use as unit).
200 visits and 100 unique session id's on tuesday.
If you want to estimate the UF for the period Mon+Tue you can do:
UV = UVmonday + UVtuesday = TVmonday*UFmonday + TVtuesday*UFtuesday
being:
UV = Unique Visitors
TV = Total Visits
UF = Uniqueness Factor
So...
UV = (Sum(TVi*UFi))
UF = UV / TV
TV = Sum(TVi)
I hope it helps...
This math counts two visits of the same person as two unique visitors. I think it's ok if the only way you have to identify somebody is via the session ID.