Excel calculate smallest of X columns within Y columns, ignoring zeros - arrays

I'm trying to calculate the sum of best segments in a run. For example, each Km gives a list as such:
5:40 6:00 5:45 5:55 6:21 6 :30
I'm trying to gather the best segments of 2km/3km/4km etc and would like a simple code to do it. At the moment, I'm using the formula
=Min(If(B1=0,9:9:9,sum(A1:B1),If(C1=0,9:9:9,sum(B1:C1))
but this goes all the way to 50km, meaning a very long formulae that I then have to repeat slightly differently at 3km, then 4km, then 5km etc. Surely there must me a way of
generating an array of summed columns of every n column, then iterating over that to find the min while ignoring values of 0?
I can do it manually for now, but what if I want to go over 50km? I might want to incorporate bike rides/car drives in the future just for some data analysis so I figured it best finding an ideal formulae now.
It's frustrating as I could code it and I want to avoid VBA ideally and stick to formulae in Excel.

Here is a draft of the case where there aren't any zeroes just for groups of 2Km. I decided the simplest approach initially was to add a couple of helper rows containing the running total of times (and for later use counts) and use a formula like this to subtract them in pairs:
=MIN(INDEX(A2:J2,SEQUENCE(1,9,2))-IF(SEQUENCE(1,9,0)=0,0,INDEX(A2:J2,SEQUENCE(1,9,0))))
but if you have access to recent additions to Excel 365 like Scan you can do it without helper rows.
Here is a more realistic scenario with a couple of zeroes thrown in
=LET(runningSum,Y$4:AP$4,runningCount,Y$5:AP$5,cols,COLUMNS(runningSum),leg,X7,
seqEnd,SEQUENCE(1,cols-leg+1,leg),seqStart,SEQUENCE(1,cols-leg+1,0),
times,INDEX(runningSum,seqEnd)-IF(seqStart=0,0,INDEX(runningSum,seqStart)),
counts,INDEX(runningCount,seqEnd)-IF(seqStart=0,0,INDEX(runningCount,seqStart)),
MIN(IF(counts=leg,times)))
Note that there are no runs of more than seven consecutive legs that don't contain a zero so 8, 9, 10 etc. just work out to 0.
As mentioned you could dispense with the helper rows by using Scan, but not everyone has access to this so I will add it separately:
=LET(data,Y$3:AP$3,runningSum,SCAN(0,data,LAMBDA(a,b,a+b)),
runningCount,SCAN(0,data,LAMBDA(a,b,a+(b>0))),leg,X7,cols,COLUMNS(data),
seqEnd,SEQUENCE(1,cols-leg+1,leg),seqStart,SEQUENCE(1,cols-leg+1,0),
times,INDEX(runningSum,seqEnd)-IF(seqStart=0,0,INDEX(runningSum,seqStart)),
counts,INDEX(runningCount,seqEnd)-IF(seqStart=0,0,INDEX(runningCount,seqStart)),
MIN(IF(counts=leg,times)))

Tom that worked! I learnt a few things on the way too and using the indexing method alongside sequence and columns is something I had not thought of. I'd never heard of the LET command before and I can already see that this is going to really help with some of the bigger calculations in the future.
Thank you so much, I'd like to show you how it now looks. Row 3087 is my old formula, row 3088 is a copy of the same data using the new formula, as you can see I've gotten exactly the same results so it's clear that it works perfectly and it is can be easily duplicated.

Related

Ive got a pipe that consists of 5 pieces, each including 5 properties

Inlet -> front -> middle -> rear -> outlet
Those five properties have a value anything between 4 - 40. Now i want to calculate a specific match for each of those values that is either a full 10 or a 5 when a single property is summed from each pipe piece. There might be hundreds of different pipe pieces all with different properties.
So if i have all 5 pieces and when summed, their properties go like 54,51,23,71,37. That is not good and not what im looking.
Instead 55,50,25,70,40. That would be perfect.
My trouble is there are so many of the pieces that it would be insane to do the miss'matching manually, and new ones come up frequently.
I have manually inserted about 100 of these already into SQLite, but should be easy to convert into any excel or other database formats, so answer can be related to anything like mysql or googlesheets.
I need the calculation that takes every piece in account and results either in "no match" or tells me the id of each piece that is required for a match and if multiple matches are available, it separates them.
Edit: Even just the math needed to do this kind of calculation would be a lot of help here, not much of a math guy myself. I guess there should be a reference piece i need to use and then that gets checked against every possible scenario.
If the value you want to verify is in A1, use: =ROUND(A1/5,0)*5
If the pipes may not be shorter than the given values, use =CEILING(A1,5)

Count down a column until value in column meets condition

I have a simple daily rainfall data set and would like to calculate the antecedent dry period for each day. Here, I'm defining a dry day to be "<10". I'm fairly unfamiliar with INDEX(), MATCH(), and other fancy array functions but feel like I'll need to use them.
For example, in the image, for 1/17/2020, the values in cells C3:C9=0, C10=1, C11:C13=0. I've tried various versions of COUNTIF(), COUNTIFS(), and IF() functions but I cannot get the step-wise + re-set functionality necessary when extended "dry spells" or brief rain periods occur with gaps. Thanks!
You are right, you need to use Match. Basically you need to search for the next antecedent wet day (of which there are many here in Manchester England at the moment) and subtract 1 (Formula 1):
=MATCH(TRUE,INDEX(B15:B$1000>=10,0),0)-1
where B$1000 may need changing to include all of your data. The use of Index here is just a bit of a hack to avoid having to enter the formula as an array formula.
As you can see there is an issue when you come to the end of the range which I will come to in a minute.
In this case, we want to count the number of antecedent dry days to the end of the range (Formula 2):
=IFERROR(MATCH(TRUE,INDEX(B4:B$1000>=10,0),0)-1,COUNTIF(B4:B$1000,"<10"))
If the range ended with a dry spell, you would get this:

Multiple IF QUARTILEs returning wrong values

I am using a nested IF statement within a Quartile wrapper, and it only kind of works, for the most part because it's returning values that are slightly off from what I would have expected if I calculate the range of values manually.
I've looked around but most of the posts and research is about designing the fomrula, I haven't come across anything compelling in terms of this odd behaviour I'm observing.
My formula (ctrl+shift enter as it's an array): =QUARTILE(IF(((F2:$F$10=$W$4)($Q$2:$Q$10=$W$3))($E$2:$E$10=W$2),IF($O$2:$O$10<>"",$O$2:$O$10)),1)
The full dataset:
0.868997877*
0.99480118
0.867040346*
0.914032128*
0.988150438
0.981207615*
0.986629288
0.984750004*
0.988983643*
*The formula has 3 AND conditions that need to be met and should return range:
0.868997877
0.867040346
0.914032128
0.981207615
0.984750004
0.988983643
At which 25% is calculated based on the range.
If I take the output from the formula, 25%-ile (QUARTILE,1) is 0.8803, but if I calculate it manually based on the data points right above, it comes out to 0.8685 and I can't see why.
I feel it's because the IF statements identifies slight off range but the values that meet the IF statements are different rows or something.
If you look at the table here you can see that there is more than one way of estimating quartile (or other percentile) from a sample and Excel has two. The one you are doing by hand must be like Quartile.exc and the one you are using in the formula is like Quartile.inc
Basically both formulas work out the rank of the quartile value. If it isn't an integer it interpolates (e.g. if it was 1.5, that means the quartile lies half way between the first and second numbers in ascending order). You might think that there wouldn't be much difference, but for small samples there is a massive difference:
Quartile.exc Rank=(N+1)/4
Quartile.inc Rank=(N+3)/4
Here's how it would look with your data

Why is this Formula for Alteryx returning 0's instead of averages

I was wondering what is wrong with the following formula.
IF [Age] = Null() THEN Average([Age]) ELSE [Age] ENDIF
What I am trying to do "If the cell is blank then fill the cell with the average of all other cells called [Age].
Many thanks all!
We do a lot of imputation to correct null values during our ETL process, and there are really two ways of accomplishing it.
The First Way: Imputation tool. You can use the "Imputation" tool in the Preparation category. In the tool options, select the fields you wish to impute, click the radio button for "Null" on Incoming Value to Replace, and then click the radio button for "Average" in the Replace With Value section. The advantages of using the tool directly are that it is much less complicated than the other way of doing it. The downsides are 1) if you are attempting to fix a large number of rows relative to machine specs it can be incredibly slow (much slower than the next way), and 2) it occasionally errors when we use it in our process without much explanation.
The Second Way: Calculate averages and use formulas. You can also use the "Summarize" tool in the Transform category to generate an average field for each column. After generating the averages, use the "Append" tool in the Join category to join them back into the stream. You will have the same average values for each row in your database. At that point, you can use the Formula tool as you attempted in your question. E.g.
IF [Age] = Null() THEN [Ave_Age] ELSE [Age] ENDIF
The second way is significantly faster to run for extremely large datasets (e.g. fixing possible nulls in a few dozen columns over 70 million rows), but is much more time intensive to set up and must be created for each column.
That is not the way the Average function works. You need to pass it the entire list of values, not just one.

Formatting based on last digit

I'm using Microsoft SQL Server Report Builder to list some production data i a table, mainly part numbers. I would like it to change fill color of the part number cells, based on the number in the cell.
Previously we have been using a solution, using mod10 to color it, based on the last digit. This will cause a repeat for every tenth part number, but that is fine. However we have now started a new series, which means that I need to deal with the number 1-9. Obviously, the mod10 trick does not work here. Is there a smarter way of getting the last digit, which also works on numbers from 1-9, or do I have to make some sort of IIF statement?
Her is an example of the code I use, though with mod5, rather than mod10:
=Choose(1+ Fields!cPri_runnr.Value.Value Mod 5,"DarkOliveGreen","Olive","LimeGreen","Yellow","Khaki")
There are several options here, if Mod is sufficient you can use Choose, Switch, or even IIF. In my opinion, the best solution would use Custom Code to hash the part number (or even take multiple inputs from the details row) and return a color directly. This could then be easily re-used in multiple sections of the report (chart colors, additional cell back-colors, even cell text color).

Resources