I'm trying to calculate the sum of best segments in a run. For example, each Km gives a list as such:
5:40 6:00 5:45 5:55 6:21 6 :30
I'm trying to gather the best segments of 2km/3km/4km etc and would like a simple code to do it. At the moment, I'm using the formula
=Min(If(B1=0,9:9:9,sum(A1:B1),If(C1=0,9:9:9,sum(B1:C1))
but this goes all the way to 50km, meaning a very long formulae that I then have to repeat slightly differently at 3km, then 4km, then 5km etc. Surely there must me a way of
generating an array of summed columns of every n column, then iterating over that to find the min while ignoring values of 0?
I can do it manually for now, but what if I want to go over 50km? I might want to incorporate bike rides/car drives in the future just for some data analysis so I figured it best finding an ideal formulae now.
It's frustrating as I could code it and I want to avoid VBA ideally and stick to formulae in Excel.
Here is a draft of the case where there aren't any zeroes just for groups of 2Km. I decided the simplest approach initially was to add a couple of helper rows containing the running total of times (and for later use counts) and use a formula like this to subtract them in pairs:
=MIN(INDEX(A2:J2,SEQUENCE(1,9,2))-IF(SEQUENCE(1,9,0)=0,0,INDEX(A2:J2,SEQUENCE(1,9,0))))
but if you have access to recent additions to Excel 365 like Scan you can do it without helper rows.
Here is a more realistic scenario with a couple of zeroes thrown in
=LET(runningSum,Y$4:AP$4,runningCount,Y$5:AP$5,cols,COLUMNS(runningSum),leg,X7,
seqEnd,SEQUENCE(1,cols-leg+1,leg),seqStart,SEQUENCE(1,cols-leg+1,0),
times,INDEX(runningSum,seqEnd)-IF(seqStart=0,0,INDEX(runningSum,seqStart)),
counts,INDEX(runningCount,seqEnd)-IF(seqStart=0,0,INDEX(runningCount,seqStart)),
MIN(IF(counts=leg,times)))
Note that there are no runs of more than seven consecutive legs that don't contain a zero so 8, 9, 10 etc. just work out to 0.
As mentioned you could dispense with the helper rows by using Scan, but not everyone has access to this so I will add it separately:
=LET(data,Y$3:AP$3,runningSum,SCAN(0,data,LAMBDA(a,b,a+b)),
runningCount,SCAN(0,data,LAMBDA(a,b,a+(b>0))),leg,X7,cols,COLUMNS(data),
seqEnd,SEQUENCE(1,cols-leg+1,leg),seqStart,SEQUENCE(1,cols-leg+1,0),
times,INDEX(runningSum,seqEnd)-IF(seqStart=0,0,INDEX(runningSum,seqStart)),
counts,INDEX(runningCount,seqEnd)-IF(seqStart=0,0,INDEX(runningCount,seqStart)),
MIN(IF(counts=leg,times)))
Tom that worked! I learnt a few things on the way too and using the indexing method alongside sequence and columns is something I had not thought of. I'd never heard of the LET command before and I can already see that this is going to really help with some of the bigger calculations in the future.
Thank you so much, I'd like to show you how it now looks. Row 3087 is my old formula, row 3088 is a copy of the same data using the new formula, as you can see I've gotten exactly the same results so it's clear that it works perfectly and it is can be easily duplicated.
This is hard to explain so my title sucks, and is just my best guess at how I might be able to approach this. I have a Google Sheet of sales data for cases of various bottle sizes of kombucha. Column E is the sale date, Column G contains the item code, and column J is the quantity sold of said cases. See my (vastly simplified) sample data:
https://docs.google.com/spreadsheets/d/17-LzGrNJtBr-FwOZtdaoCws3ayeGOHu_TdtGOfXj4cA/edit?usp=sharing
See my current test code below (also present in the Formula tab of the linked spreadsheet). It successfully gives me the combined number of cases sold of half-liter bottles and Growlers. The values in E4 and E5 are cells containing my start and end dates, respectively, so I'm constraining the results only to those which fall within a certain date range.
This code works, but now I need to figure out a way to sum the total number of bottles sold instead of # of cases. The data set is already massive and pushing the limits of google sheets, so adding a column to the source data sheet with # of bottles per case is not an option. Half liter cases hold 13 bottles, and growlers hold 5. Is there any way to do this with my current approach, using another array perhaps? Or any other approach that keeps the formula as simple as possible?
FYI the current formula is a proof of concept and I will be adding many additional types of cases to the existing formula, each containing a different number of bottles per case, and using it as part of a larger dynamic formula that allows you to switch between showing # cases vs # bottles vs # of actual liters sold, so this is why I am hoping to find an array-based approach that will let me do this without needing to resort to an absurdly long and complex formula of nested IF statements.
=SUMPRODUCT(--((XeroInvoiceData!$E$3:$E>=B4)*(XeroInvoiceData!$E$3:$E<=B5)), (--(ISNUMBER(MATCH(XeroInvoiceData!$G$3:$G, {"HalfLiterCase","GrowlerCase"}, 0)))), XeroInvoiceData!$J$3:$J)
I would be eternally grateful for any assistance.
Here is my solution:
https://docs.google.com/spreadsheets/d/1ig0krumJu4Lj9-nIKJyRfPLTYbU-mzOL0JokRUDEqNc/edit?usp=sharing
My idea was to filter your table on date and sum by the type of container.
I wanted also to allow new types of containers that contain smaller units (bottles or liters).
I divided this job into 3 stages.
First we have to filter this table according to selected dates and container types.
I prepared a list that may be extended (all you need is to extend the filter range).
Then I have to vlookup values of units in each container and I try to do it inside the same formula.
General idea is
={[query results],arrayformula(ifna(vlookup([first column of query],$C$21:$D$26,2,0)*[second column of query])}
I divide it into 2 stages.
First stage referrs to query results in adjacent table:
Second stage uses indexes of query so formula is quite long:
Tell me if it solves your problem.
Consider the following table
I need to generate a bar chart with the Category Group = "Country". The chart should only display the top 3 Groups based on the count of records for a country. I have already applied a filter for the Category Group specifying the Top N Condition as 3 for Count(Country). The chart generated, applies the filter as expected based on count, but i need only 3 bars to be shown even if there are bars with duplicate values.
Below is the chart that I get.
Expected Result
Now I know that, i can create an additional column in my dataset with ranking values and then apply a filter on this column to get the expected result (i have tried this, and it works)
Is there a way to achieve the expected result without changing the underlying Dataset?
Note: The dataset shown above is a highly simplified version of my dataset. In reality i have a huge dataset with a lot of columns. The same dataset has been used for various charts (with groupings on different columns).
This was an interesting question as I've always just "solved" the tiebreaker in the dataset without much thought. However, I do see a fairly easy way to use the rnd() function to dissolve the ties as long as you don't care which of the tied countries is shown:
=(Count(Fields!Country.Value) * 1000) + (Rnd() * 100)
Which essentially just weights the count per country into the thousands and then tiebreaks with a random small value:
New York: 30XX
France: 20XX
China: 10XX
Italy: 10XX
Singapore: 10XX
If you wanted to actually solve the tiebreaker with an alphabetical preference, you could do something similar but incorporate the numeric value for the first letter of the country etc...
I just found out, I could use an x-axis minimum of 0 and an maximum of 10.5 together with an interval setting of 1.
So, I was able to achieve a top 10 limitation and the axis labels show the names - (it may be a side effect but when I changed the axis maximum to a whole number, the axis no longer shows the names but numbers).
I was not really happy with the other approaches. They seems much of an overkill to me for such a simple requirement like limiting the number of bars shown in a chart.
I was wondering what is wrong with the following formula.
IF [Age] = Null() THEN Average([Age]) ELSE [Age] ENDIF
What I am trying to do "If the cell is blank then fill the cell with the average of all other cells called [Age].
Many thanks all!
We do a lot of imputation to correct null values during our ETL process, and there are really two ways of accomplishing it.
The First Way: Imputation tool. You can use the "Imputation" tool in the Preparation category. In the tool options, select the fields you wish to impute, click the radio button for "Null" on Incoming Value to Replace, and then click the radio button for "Average" in the Replace With Value section. The advantages of using the tool directly are that it is much less complicated than the other way of doing it. The downsides are 1) if you are attempting to fix a large number of rows relative to machine specs it can be incredibly slow (much slower than the next way), and 2) it occasionally errors when we use it in our process without much explanation.
The Second Way: Calculate averages and use formulas. You can also use the "Summarize" tool in the Transform category to generate an average field for each column. After generating the averages, use the "Append" tool in the Join category to join them back into the stream. You will have the same average values for each row in your database. At that point, you can use the Formula tool as you attempted in your question. E.g.
IF [Age] = Null() THEN [Ave_Age] ELSE [Age] ENDIF
The second way is significantly faster to run for extremely large datasets (e.g. fixing possible nulls in a few dozen columns over 70 million rows), but is much more time intensive to set up and must be created for each column.
That is not the way the Average function works. You need to pass it the entire list of values, not just one.
I am using a report to show wastage calculations after converting our stock.
I am trying to get the totals to display at the bottom, however some of the columns are made up of fields that are calculated (such as sValue-Wastage)
For these fields I can't seem to use =Sum(rValue), it acts as though this is a variable that is to be decided by the user when I input this, it also adds square brackets around rValue (=Sum([rValue])).
Is there some different way of acheiving this that I need to know about? I am trying to get:
=Sum(Wastage)
Where Wastage is:
=[sValue]-[rValue]
Thanks,
Bob P
You must sum the two calculated fields.
=Sum([sValue]-[rValue])
From your comments, it appears that rValue holds a domain aggregate function. You can try:
=Sum([sValue]-DSum("sValue","[Stock Conversion Items]")
This will be okay if all SCIDs in [Stock Conversion Items] should be included, if not you may be able to add a WHERE statement. If the relationship is too complicated for a WHERE statement, consider basing your report on a query that includes [Stock Conversion Items].
It is an old story: http://support.microsoft.com/kb/208850