Getting an average from a nested calculation with averageifs in excel - arrays

i am trying to pull performance data togther from a csv. We have a design team with a list of tasks which arrive on their list, sit there until dealt with, and then get completed. For any given day i want to calculate:
1/ The number of tasks on the list (so i'm running countifs, with citeria being 1: the design team, 2: the task had arrived on the list by the given day, and 3: had not been completed by the given day. This works ok.
2/ I also want to know what the average amonth of days those tasks had been on the list on that given day. IN the data set i have the day they arrive, and the day they got completed, and i know the given day i want to calculate for. So i think i ned to nest a formula to calculate (given date - arrival date) into an averageifs formula with the same conditions as the task count.
I have set this up in a sheet using a helper column to give me the days elapsed, but in practice, i want to run this for a number of given days so cannot add a helper column for each query.
So i have =AVERAGEIFS($G:$G,B:B,"Design",$C:$C, "<"&I$1, $E:$E, ">="&I$1)
but where $G:$G is a calculation of the days elapsed =IF(C2<>"",SUM(I$1-C2),"")
enter image description here

Related

How to calculate the hourly number of observation?

I need to calculate the hourly number of observations for a data set consists of around 100000 dates. each data-time represents one observation.
For example, I need to calculate the hourly events from 2018-11-25-20-35-00 to 2018-11-27-01-35-00.
Additionally, where there is no event there should be an output of zero.
The sample data set is shown in this figure

How many trucks came empty but bought something

I think this is the hardest to date I have had to crack - so hard I had a hard time finding a good headline.
So we have a site where trucks come and buy say Gravel, or sand or other building materials.
Sometimes they also unload demolition waste first.
I need to find out a couple of things
how many trucks (and from what companys) came empty
if they came empty what did they buy from us.
what companys are sending full trucks and what are sending empty trucks.
a tope 10 of materials they will drive to us from to buy even when coming empty to our facility.
a list of all the order numbers that they drove to us til fill and came with empty trucks. ( I have distances linked to order numbers, so now I can estimate the value of our products)
The data I have available:
I have a full data set of when what customer buys what and / or pay to deliver.
E.G.:
I can see the parts I need to split the data into I think it should be something like this
find all unique licence plates
somehow map if they bought materials within 30 minutes of
offloading demolition waste (most trucks will come between 2 and 10
times per day)
Present all this data (on a normal day we have about 800 trucks = 2000 lines since they weigh in, weigh out, and then some buy something = 2 more weigh lines)
I can easily find unique licence plates per day (either by formula or by Excel function Data/delete doublets,
but after that I have no clue where to start.
I think I need some sheets in between, where I somehow mark if a material was bought from an "empty truck" and I need a counter for that .. somehow...
Any help on how to get started is appreciated.
It seems like the best way to start is with a helper column (in the following exampes, I have chosen "Column M") to flag whether the truck arrived empty.
In the helper column, you can use something similar to the following formula.
{=IF(ISBLANK(B2),0,IF(C2="In",0,IF(B2=$B$2:$B$13,IF($C$2:$C$13="In",IF($A$2:$A$13>(A2-TIME(0,30,0)),0,1),1),1)))}
This is an array formula, which means you have to press ctrl+shift+enter after pasting it in the cell. Then you can copy that cell down the column.
Just to explain, the first if statement knows the truck is not arriving empty if Column C is 'In'. The second if statement creates an array and tests to see if other the same truck appears in other rows. The third if statement checks to see if the same truck checked 'In' in the matching rows, and the fourth if statement verifies if the time they checked in was less than thirty minutes ago. You can adjust the length by editing the TIME(0,30,0) function. The format is TIME(hours,minuites,seconds). Unless the truck matches all three of the second, third and fourth if statements, it is marked as coming empty.
Once you have this helper column, just about all of your tasks are quite simple.
1a: How many trucks came empty? Sum Column M
1b: How many trucks from what company? Create a unique list of companies. Then create a COUNTIFS formula based on Column M = 1 and Column K = Company. For example, if C32 had Company B then the formula =COUNTIFS($M$2:$M$13,1,$K$2:$K$13,C32) would return 2
1c: How many times did a truck come empty? Similar to 1b, create a unique list of License Plates, then use a COUNTIFS based on Column M = 1 and Column B = License Plate.
2: Similar to 1b, just use a unique list of products tested against Column F
3: Similar to 1b, just create a second column, next to the first that uses =COUNTIFS($M$2:$M$13,0,$K$2:$K$13,C53,$C$2:$C$13,"In") Which tests that Column M reports the truck did not come empty, that matches the company in Column K and that the truck came 'In' so you don't double count the same truck when it goes 'out'
4: Just sort list created by number 2. You can highlight the range, right-click and select "Sort" > "Custom Sort", then select the column you want to sort on and largest to smallest.
5: There are a couple of different ways, you could do this. The formula
{=TEXTJOIN(", ",TRUE,IF($M$2:$M$13=1,$J$2:$J$13,""))}
(again, entered as an array formula)
would create a comma separated list of order numbers. An alternative if you want a column of order numbers (but would only work if they are actually numbers), is to paste the formula {=MAX(IF($M$2:$M$13=1,$J$2:$J$13,))} in the first row of the column (in my example, its O2) and then {=MAX(IF($M$2:$M$13=1,IF($J$2:$J$13<O2,$J$2:$J$13,)))} in the row below (change the reference to O2 if you pasted it in a different spot)(again, note that both of these are array formulas). Then copy and paste the second formula down the column. When order numbers of trucks that came in empty are exhausted, the formula will report 0.

Multiplying arrays resulting from multiplying arrays in Excel

I've tried looking through some of the posts and I'm having trouble finding something that will help me in this situation.
I have a spreadsheet that has Total Sales, Retail Price, and Inventory for each week in a year for a list of 100 or so projects. These three pieces of info are displayed as columns repeated for the year, with a row for each item.
I was able to add up the total annual cells (every 3rd column) using SUMPRODUCT((MOD(COLUMN(D3:L3),3)=0)*D3:L3)
The next goal is to get a formula to calculate the weighted average retail. I basically need to find a formula that will end up with the SUMPRODUCT of an array of Sales data and Retail data.
I have tried to use some layering of MMULT and SUMPRODUCT but keep getting #VALUE! errors. Particularly with SUMPRODUCT(TRANSPOSE(MMULT(TRANSPOSE((MOD(COLUMN(D3,L3),3)=0)),D3,L3)),MMULT(TRANSPOSE((MOD(COLUMN(D3,L3),3)=1)),D3,L3)) and with putting braces in there as well =SUMPRODUCT({TRANSPOSE(MMULT(TRANSPOSE((MOD(COLUMN(D3,L3),3)=0)),D3,L3))},{MMULT(TRANSPOSE((MOD(COLUMN(D3,L3),3)=1)),D3,L3)})
Does anyone have any experience with this type of issue? I feel like it should be something that Excel can do without having to have separate sheets to calculate.
For your weighted average:
=SUMPRODUCT(($D$2:$FD$2="Sales")*$D3:$FD3*$E3:$FE3)/SUMIF($D$2:$FD$2,"Sales",$D3:$FD3)
And also, to add up the Sales, you could consider:
=SUMIF($D$2:$FD$2,"Sales",$D3:$FD3)
I am assuming FD is the last column of data for the year, but change it if that is not the case.

Rolling Arrays as a possible solution to a rolling window in SAS

I am trying to calculate the dosage of a particular drug for a population to see if any member in this population is over a certain threshold for any 90 consecutive days. So to do this I am thinking that I am going to need to make a arraty that looks at the strengh of this drug over 90 days from an index and if they are all '1' then they get a 'pass', and somehow put this into a do loop to look at all of the potential 90 day windows for a year i=1...i=275 to see if a member at any point during the year has met the criteria. Thoughts?

Web stats: Calculating/estimating unique visitors for arbitary time intervals

I am writing an application which is recording some 'basic' stats -- page views, and unique visitors. I don't like the idea of storing every single view, so have thought about storing totals with a hour/day resolution. For example, like this:
Tuesday 500 views 200 unique visitors
Wednesday 400 views 210 unique visitors
Thursday 800 views 420 unique visitors
Now, I want to be able to query this data set on chosen time periods -- ie, for a week. Calculating views is easy enough: just addition. However, adding unique visitors will not give the correct answer, since a visitor may have visited on multiple days.
So my question is how do I determine or estimate unique visitors for any time period without storing each individual hit. Is this even possible? Google Analytics reports these values -- surely they don't store every single hit and query the data set for every time period!?
I can't seem to find any useful information on the net about this. My initial instinct is that I would need to store 2 sets of values with different resolutions (ie day and half-day), and somehow interpolate these for all possible time ranges. I've been playing with the maths, but can't get anything to work. Do you think I may be on to something, or on the wrong track?
Thanks,
Brendon.
If you are OK with approximations, I think tom10 is onto something, but his notion of random subsample is not the right one or needs a clarification. If I have a visitor that comes on day1 and day2, but is sampled only on day2, that is going to introduce a bias in the estimation. What I would do is to store full information for a random subsample of users (let's say, all users whose hash(id)%100 == 1). Then you do the full calculations on the sampled data and multiply by 100. Yes tom10 said about just that, but there are two differences: he said "for example" sample based on the ID and I say that's the only way you should sample because you are interested in unique visitors. If you were interested in unique IPs or unique ZIP codes or whatever you would sample accordingly. The quality of the estimation can be assessed using the normal approximation to the binomial if your sample is big enough. Beyond this, you can try and use a model of user loyalty, like you observe that over 2 days 10% of visitors visit on both days, over three days 11% of visitors visit twice and 5% visit once and so forth up to a maximum number of day. These numbers unfortunately can depend on time of the week, season and even modeling those, loyalty changes over time as the user base matures, changes in composition and the service changes as well, so any model needs to be re-estimated. My guess is that in 99% of practical situations you'd be better served by the sampling technique.
You could store a random subsample of the data, for example, 10% of the visitor IDs, then compare these between days.
The easiest way to do this is to store a random subsample of each day for future comparisons, but then, for the current day, temporarily store all your IDs and compare them to the subsampled historical data and determine the fraction of repeats. (That is, you're comparing the subsampled data to a full dataset for a given day and not comparing two subsamples -- it's possible to compare two subsamples and get an estimate for the total but the math would be a bit trickier.)
You don't need to store every single view, just each unique session ID per hour or day depending on the resolution you need in your stats.
You can keep these log files containing session IDs sorted to count unique visitors quickly, by merging multiple hours/days. One file per hour/day, one unique session ID per line.
In *nix, a simple one-liner like this one will do the job:
$ sort -m sorted_sid_logs/2010-09-0[123]-??.log | uniq | wc -l
It counts the number of unique visitors during the first three days of September.
You can calculate the uniqueness factor (UF) on each day and use it to calculate the composite (week by example) UF.
Let's say that you counted:
100 visits and 75 unique session id's on monday (you have to store the sessions ID's at least for a day, or the period you use as unit).
200 visits and 100 unique session id's on tuesday.
If you want to estimate the UF for the period Mon+Tue you can do:
UV = UVmonday + UVtuesday = TVmonday*UFmonday + TVtuesday*UFtuesday
being:
UV = Unique Visitors
TV = Total Visits
UF = Uniqueness Factor
So...
UV = (Sum(TVi*UFi))
UF = UV / TV
TV = Sum(TVi)
I hope it helps...
This math counts two visits of the same person as two unique visitors. I think it's ok if the only way you have to identify somebody is via the session ID.

Resources