Google Sheets Arrayformula Calculating Very Slow - arrays

I made a Google Sheet to track some metrics at my job. In a simple explanation, I create annotations (or tracks) on images and submit it for review. If it is reviewed and sent back to me, I fix it and add whatever I missed and send it back.
In order to track my progress, I wanted to create a formula that tracks how many total tracks I missed for each task (most recent attempt - earliest attempt). It took some finagling, but I finally got a formula that works. My only complaint is that it works extremely slow. Here is my formula:
=ARRAYFORMULA({"Number of Tracks Missed"; IF(NOT(ISBLANK($F$2:$F)), IF($G$2:$G = 1, 0, (VLOOKUP($F$2:$F,SORT($B$2:$C, $A$2:$A, FALSE), 2, FALSE)) - (VLOOKUP($F$2:$F, $B$2:$C, 2, FALSE))), "")})
Column G is a count of how many times I have worked on a task, so if I've only had it once, then I didn't miss anything so that equates to 0. If it's been more than once, I use a complicated VLOOKUP combined with a SORT that flips the list of my tasks to the reverse order, so the VLOOKUP will find the most recent attempt, then subtract my earliest attempt from it.
Like I said, this formula works and gives the correct response every time, it's just super slow. Is there a way I can speed this calculation up, or is this the best/only option I have? I'm somewhat new to using formulas like this, so any help is appreciated.
EDIT:
Thanks to player0 for pointing me in the right direction. My formula was fine, the problem was all the extra rows I had in the sheet. For some reason, There were 32,000+ rows. With the Arrayformula calculating for every single one, even if it's just with blanks, it was taking a long time.
As soon as I deleted the extras, it now calculates almost immediately.

somewhere along the path of creation of your sheet an error occurred and added 32k of empty rows to the sheet which caused you that decreased response.
if you don't need so many rows - delete them. your formula is fine.

Related

Excel calculate smallest of X columns within Y columns, ignoring zeros

I'm trying to calculate the sum of best segments in a run. For example, each Km gives a list as such:
5:40 6:00 5:45 5:55 6:21 6 :30
I'm trying to gather the best segments of 2km/3km/4km etc and would like a simple code to do it. At the moment, I'm using the formula
=Min(If(B1=0,9:9:9,sum(A1:B1),If(C1=0,9:9:9,sum(B1:C1))
but this goes all the way to 50km, meaning a very long formulae that I then have to repeat slightly differently at 3km, then 4km, then 5km etc. Surely there must me a way of
generating an array of summed columns of every n column, then iterating over that to find the min while ignoring values of 0?
I can do it manually for now, but what if I want to go over 50km? I might want to incorporate bike rides/car drives in the future just for some data analysis so I figured it best finding an ideal formulae now.
It's frustrating as I could code it and I want to avoid VBA ideally and stick to formulae in Excel.
Here is a draft of the case where there aren't any zeroes just for groups of 2Km. I decided the simplest approach initially was to add a couple of helper rows containing the running total of times (and for later use counts) and use a formula like this to subtract them in pairs:
=MIN(INDEX(A2:J2,SEQUENCE(1,9,2))-IF(SEQUENCE(1,9,0)=0,0,INDEX(A2:J2,SEQUENCE(1,9,0))))
but if you have access to recent additions to Excel 365 like Scan you can do it without helper rows.
Here is a more realistic scenario with a couple of zeroes thrown in
=LET(runningSum,Y$4:AP$4,runningCount,Y$5:AP$5,cols,COLUMNS(runningSum),leg,X7,
seqEnd,SEQUENCE(1,cols-leg+1,leg),seqStart,SEQUENCE(1,cols-leg+1,0),
times,INDEX(runningSum,seqEnd)-IF(seqStart=0,0,INDEX(runningSum,seqStart)),
counts,INDEX(runningCount,seqEnd)-IF(seqStart=0,0,INDEX(runningCount,seqStart)),
MIN(IF(counts=leg,times)))
Note that there are no runs of more than seven consecutive legs that don't contain a zero so 8, 9, 10 etc. just work out to 0.
As mentioned you could dispense with the helper rows by using Scan, but not everyone has access to this so I will add it separately:
=LET(data,Y$3:AP$3,runningSum,SCAN(0,data,LAMBDA(a,b,a+b)),
runningCount,SCAN(0,data,LAMBDA(a,b,a+(b>0))),leg,X7,cols,COLUMNS(data),
seqEnd,SEQUENCE(1,cols-leg+1,leg),seqStart,SEQUENCE(1,cols-leg+1,0),
times,INDEX(runningSum,seqEnd)-IF(seqStart=0,0,INDEX(runningSum,seqStart)),
counts,INDEX(runningCount,seqEnd)-IF(seqStart=0,0,INDEX(runningCount,seqStart)),
MIN(IF(counts=leg,times)))
Tom that worked! I learnt a few things on the way too and using the indexing method alongside sequence and columns is something I had not thought of. I'd never heard of the LET command before and I can already see that this is going to really help with some of the bigger calculations in the future.
Thank you so much, I'd like to show you how it now looks. Row 3087 is my old formula, row 3088 is a copy of the same data using the new formula, as you can see I've gotten exactly the same results so it's clear that it works perfectly and it is can be easily duplicated.

Return only single max value row in Power BI Desktop

I have the following table of Parts which are sold for a particular job, which is the Order Number.
I am trying to extract just the Description of the most expensive part so that I can put it onto a single value card.
I have tried for a day mucking around with CALCULATE, MAX, TOP, SELECTEDVALUE, I cant seem to figure it out. I'm sure it is something simple too...
Would appreciate it if somebody can help me retrieve it in a way that I can see what I missed and learn for future.
My page is filtered by DrillThrough on the Order Number which filters the parts list for me.
Essentially, I just want the card to show 'PUMP,DTH,ELE'. My approach was to just select the top 1 row when the parts list is sorted descending by Amount in LC but it so far has not been as simple as that :(
Should it be a calculated column or measure on my Order table which has that string?
You should be able to create a measure that does this and then place that measure on a card.
Most Expensive Part = LOOKUPVALUE(Parts[Description],Parts[Amount],MAX(Parts[Amount]))
The MAX(Parts[Amount] piece gives you the maximal amount. Then you look up the description corresponding to that amount.

Using FDQuery.RecordCount to tell if the query is empty or not may return negative value in Delphi

I have an FDQuery on a form, and an action that enable and disable components according to its recordCount ( enabled when >0).
Most of times recordCount property return the actual count of records in the query. Sometimes, recordcount return negative value, but I can see records on a grid associated with the query.
RecordCount returned values between -5 and -1, until now.
How can I solve this? why does it return negative values?
why does it return negative values?
That is not FireDAC specific, that is a normal behavior of all Delphi libraries providing TDataSet interface.
TDataSet.RecordCount was never warranted to work. It is even said in the documentation. This is a bonus functionality. Sometimes it may work, other times it will not. It is only expected to work reliably for non-SQL table data like CSV, DBF, Paradox tables and other similar ISAM engines.
How can I solve this?
Relying on this function in the modern world is always a risky endeavor.
So you better design your program so that this function would never be used, or only in very few very specific scenarios.
Instead you should understand what question your program really asks to the library and then find a "language" to make this question by calling other functions, better tailored for your case.
Now tell me, when you search in Google how often do you read through up to 10th page, up to 100th page? Almost never, right? Your program users would also almost never scroll the data grid really far downwards. Keep this in mind.
You always need to show users first data and do it fast. But rarely the last data.
Now, three examples.
1) you read data from some remote server with slow internet. You can only read 100 rows per second. You grid has room to show 20 first rows. The rest user has to scroll by. In total the query can filter 10 000 rows for you.
If you just show those 20 rows to user - then it works almost instantly, it is only 0.2 seconds from when you start reading data to when you filled your grid and presented it to the user. The rest of the data would only be fetched if user would request it by scrolling (I am a bit simplifying here for clarity, I know about pre-caching TDataset.Buffers).
So if you call the RecordCount function what does your program do? It downloads ALL the records into your local memory where it counts them. And with such a speed it would take 10 000 / 100 = 100 seconds, more than a minute and a half.
Just by calling the RecordCount function you called FetchAll procedure and made your program response to the user not in 0.2 seconds but in 1m40s instead.
User would be very nervous waiting for that to finish.
2) Imagine you are fetching the data from some Stored Procedure. Or from a table, where another application is inserting the rows. In other words, that is not some static read-only data, that is live data that is being generated while you are downloading it.
So, how many rows are there then? This moment it is, for example, 1000 rows, in a second it would be 1010 rows, in two second it maybe would be 1050 rows, and so forth.
What is the One True Value when this value is being changed every now and then?
Oookey, you called RecordCount function, you SetLength-ed your array to the 1000, and now you read all the data from your query. But it takes some time to download the data. It usually is fast, but it never is instantaneous. So, for example, it took one second to you to download those 1000 rows the database query data into your array (or grid). But while you were doing it 10 more rows were generated/inserted, and your query is not .EOFed, and you keep fetching rows #1001, #1002, ... #1010 - and you put them in the array/gris rows that just do not exist!
Would it be good?
Or would you cancel your query when you went out of the array/grid boundaries?
That way you would not have Access Violation.
But instead you would have those most recent 10 rows ignored and missed.
Is that good?
3) your query, when you debug it, returns 10 000 rows. You download them all into you program's local memory by calling RecordCount function, and it works like a charm, and you deploy it.
You client uses your program, and the data grows, and one day your query returns not the 10 000 rows, but 10 000 000.
Your program calls RecordCount function to download all those rows, it downloads, for example, 9 000 000 millions...
....and then it crashes with Out Of Memory error.
enable and disable components according to its recordCount > 0
That is a wrong approach to get the data you do not ever need (exact quantity of rows), then discard it. The examples above show you how that makes your program fragile and slow.
All you really want to know is if there are any rows or none at all.
You do not need to count all the rows ad learn their amount, you only wonder whether the query is empty or not.
And that is exactly what you should ask by calling the TDataSet.IsEmpty function instead of RecordCount.

Excel Index Match Small - Formula working only in manual mode

I'm trying to adapt a formula for my needs, but I can seem to make it work only in Manual calculation mode for some reason. If Automatic mode is selected formula returns 0 on every row.
In essense formula is returning ALL matches based on the Blue keyword in column I matched in the sheet DIL-2018-08-14, column H. It all works great ONLY in manual mode and only after manual recalculation on every cell.
Can someone advise if there is was to avoid this and make it wortk in automatic mode as well.
The formula is:
=IFERROR(INDEX('DIL-2018-08-14'!$H$9:$H$502,SMALL(IF(ISNUMBER(SEARCH(LEFT(I8,FIND(" ",I8)-1),'DIL-2018-08-14'!$H$9:$H$502)),ROW('DIL-2018-08-14'!$H$9:$H$502)-ROW('DIL-2018-08-14'!$H$9)+1),COUNTIF($J$7:J8,"*"&LEFT(I8,FIND(" ",I8)-1)&"*")+1)),"")
Here are the steps to fix it on your own:
Change the whole formula to something with hardcoded values like this:
=IFERROR(INDEX('DIL-2018-08-14'!$H$9:$H$502,1,1),"")
Check whether it works.
Start rebuilding the formula until it fails, building a step by step solution with less hard coded values.
See where it fails.
Think of a solution.
I finaly nailed it! This is the working formula in case anyone needs it. I replaced the COUNTIF with SUM of the occurances of the blue keywords in the static column I, which is reseting the counter for every new keyword.
=IFERROR(INDEX(DIL!$H$9:$H$503,SMALL(IF(ISNUMBER(SEARCH(LEFT(I8,FIND(" ",I8)-1),DIL!$H$9:$H$503)),ROW(DIL!$H$9:$H$503)-ROW(DIL!$H$9)+1),SUM(--(ISNUMBER(SEARCH(LEFT(I8,FIND(" ",I8)-1),$I$6:I8)))))),"")

Why is this Formula for Alteryx returning 0's instead of averages

I was wondering what is wrong with the following formula.
IF [Age] = Null() THEN Average([Age]) ELSE [Age] ENDIF
What I am trying to do "If the cell is blank then fill the cell with the average of all other cells called [Age].
Many thanks all!
We do a lot of imputation to correct null values during our ETL process, and there are really two ways of accomplishing it.
The First Way: Imputation tool. You can use the "Imputation" tool in the Preparation category. In the tool options, select the fields you wish to impute, click the radio button for "Null" on Incoming Value to Replace, and then click the radio button for "Average" in the Replace With Value section. The advantages of using the tool directly are that it is much less complicated than the other way of doing it. The downsides are 1) if you are attempting to fix a large number of rows relative to machine specs it can be incredibly slow (much slower than the next way), and 2) it occasionally errors when we use it in our process without much explanation.
The Second Way: Calculate averages and use formulas. You can also use the "Summarize" tool in the Transform category to generate an average field for each column. After generating the averages, use the "Append" tool in the Join category to join them back into the stream. You will have the same average values for each row in your database. At that point, you can use the Formula tool as you attempted in your question. E.g.
IF [Age] = Null() THEN [Ave_Age] ELSE [Age] ENDIF
The second way is significantly faster to run for extremely large datasets (e.g. fixing possible nulls in a few dozen columns over 70 million rows), but is much more time intensive to set up and must be created for each column.
That is not the way the Average function works. You need to pass it the entire list of values, not just one.

Resources