Excel: #CALC! error (Nested Array) when using MAP functions for counting interval overlaps - arrays

I am struggling with the following formula, it works for some scenarios but not in all of them. The name input has the data set that is failing, getting an #CALC! error with the description "Nested Array":
=LET(input, {"N1",0,0;"N1",0,10;"N1",10,20},
names, INDEX(input,,1), namesUx, UNIQUE(names), dates, FILTER(input, {0,1,1}),
byRowResult, BYROW(namesUx, LAMBDA(name,
LET(set, FILTER(dates, names=name),
startDates, INDEX(set,,1), endDates, INDEX(set,,2), onePeriod, IF(ROWS(startDates)=1, TRUE, FALSE),
IF(onePeriod, IF(startDates <= IF(endDates > 0, endDates, startDates + 1),0, 1),
LET(seq, SEQUENCE(ROWS(startDates)),
mapResult, MAP(startDates, endDates, seq, LAMBDA(start,end,idx,
LET(incIdx, 1-N(ISNUMBER(XMATCH(seq,idx))),
startInc, FILTER(startDates, incIdx), endInc, FILTER(endDates, incIdx),
MAP(startInc, endInc,LAMBDA(ss,ee, N(AND(start <= ee, end >= ss))))
))),
SUM(mapResult)))
))), HSTACK(namesUx, byRowResult)
)
If we replace the input values in previous formula with the following range: A2:C4, in G1:H1 would be the expected output:
Provided also a graphical representation to visualize the intervals and their corresponding overlap. From the screenshot, we have 2 overlaps.
If we use the above formula for the same range we get the following output:
If we hover the #CALC! cell, it informs about the specific error:
Let's explain the input data and what the formula does:
Input data
First column: N1, N2, N3, represents names
Second Column: Start of the interval (I am using numeric values, but in my real situation will be dates)
Third Column: End of the interval (I am using numeric values, but in my real situation will be dates)
Formula
The purpose of the formula is to identify for each unique names, how many intervals overlap. The calculation goes by each row (BYROW) of the unique names and for each pair of start-end values, counts the overlaps with respect to the other start-end values. I use FILTER to exclude the current start-end pair with the following condition: FILTER(startDates, incIdx) and I tested it works properly.
The condition to exclude the start data of the current name of the iteration of BYROW is the following:
1-N(ISNUMBER(XMATCH(seq,idx)))
and used as second input argument of the FILTER function.
The rest is just to check the overlap range condition.
I separate the logic when a name has only one interval, from the rest because the calculation is different, For a single interval I just want to check that the end date comes after start date and treat the special case of 0. This particular case I tested it works.
Testing and workarounds
I already isolated where is the issue and when it happens. The problem happens in the following call:
MAP(startInc, endInc,LAMBDA(ss,ee, N(AND(start <= ee, end >= ss))))
when startInc and endInc has more than one row. It has nothing to do with the content of the LAMBDA function. I can use:
MAP(startInc, endInc,LAMBDA(ss,ee, 1))
and still fails. The problem is with the input arrays: startInc, endInc. If I use any other array for example the following ones it doesn't works:
MAP(seq,LAMBDA(ss, 1))
Similar result using names, startDates, etc, even if I use: {1;2;3} fails. If use use idx it works, because it is not an array. Therefore the error happens with any type of array or range.
I have also tested that the input arguments are correct having the correct shape and values. For example replacing the MAP function with: TEXTJOIN(",",, startInc)&" ; " (and also with endInc) and replacing SUM with CONCAT to concatenate the result.
In terms of input data I tested the following scenarios:
{"N1",0,0;"N1",0,10} -> Works
{"N1",0,0;"N1",0,10;"N2",10,0;"N2",10,20;"N3",20,10} -> Works
{"N1",0,0;"N1",0,10;"N1",10,20} -> Error
{"N1",0,0;"N1",0,10;"N1",10,0} -> Error
{"N1",0,0;"N1",0,10;"N1",10,0;"N1",20,10} -> Error
{"N1",0,0;"N1",0,10;"N2",10,0;"N2",10,20;"N2",20,10} -> Error
The cases that work are because it goes to the MAP function an array of size 1 (number of duplicated names is less than 3)
I did some research on internet about #CALC! error, but there is no too much details about this error and it is provided only a very trivial case. I didn't find any indication in the limit of nested calls of the new arrays functions: BYROW, MAP, etc.
Conclusion, it seems that the following nested structure produce this error:
=MAP({1;2;3}, LAMBDA(n, MAP({4;5;6}, LAMBDA(s, TRUE))))
even for a trivial case like this.
On contrary the following situation works:
=MAP({1;2;3}, LAMBDA(n, REDUCE("",{4;5;6}, LAMBDA(a,s, TRUE))))
because the output of REDUCE is not an array.
Any suggestion on how to circumvent this limitation in my original formula?, Is this a real situation of an array that cannot use another array as input?, Is it a bug?

As #JosWoolley pointed out:
LAMBDA's calculation parameter should return a single value and not an
array
I haven't seen that way, or deduced it from #CALC! Nested Array error definition:
The nested array error occurs when you try to input an array formula
that contains an array. To resolve the error, try removing the second
array...For example, =MUNIT({1,2}) is asking Excel to return
a 1x1 array, and a 2x2 array, which isn't currently supported.
=MUNIT(2) would calculate as expected
so the alternative is then to remove this second MAP call. The following link gave me an idea about how to do it: Identify overlapping dates and times in Excel, therefore using SUMPRODUCT or SUM can serve the purpose.
=LET(input, {"N1",0,0;"N1",0,10;"N1",10,20},
names, INDEX(input,,1), namesUx, UNIQUE(names), dates, FILTER(input, {0,1,1}),
byRowResult, BYROW(namesUx, LAMBDA(name,
LET(set, FILTER(dates, names=name),
startDates, INDEX(set,,1), endDates, INDEX(set,,2),
onePeriod, IF(ROWS(startDates)=1, TRUE, FALSE),
IF(onePeriod, IF(startDates <= IF(endDates > 0, endDates, startDates + 1),0, 1),
LET(seq, SEQUENCE(ROWS(startDates)),
mapResult, MAP(startDates, endDates, seq, LAMBDA(start,end,idx,
LET(incIdx, 1-N(ISNUMBER(XMATCH(seq,idx))),
startInc, FILTER(startDates, incIdx), endInc, FILTER(endDates, incIdx),
SUMPRODUCT((startInc <= end) * (endInc >= start ))
))),SUM(mapResult)))/2
))), HSTACK(namesUx, byRowResult)
)
We need to divide by 2 the result, because we are counting the overlapping in both directions. A overlaps with B and vice versa.
It can be further simplified because there is no need to build the names: startInc, endInc to exclude the range itself we are checking for overlap. We can include it and subtract one overlap. This is the way to do it:
=LET(input, {"N1",0,0;"N1",0,10;"N1",10,20},
names, INDEX(input,,1), namesUx, UNIQUE(names), dates, FILTER(input, {0,1,1}),
byRowResult, BYROW(namesUx, LAMBDA(name,
LET(set, FILTER(dates, names=name),
startDates, INDEX(set,,1), endDates, INDEX(set,,2),
onePeriod, IF(ROWS(startDates)=1, TRUE, FALSE),
IF(onePeriod, IF(startDates <= IF(endDates > 0,
endDates, startDates + 1),0, 1),
SUM(MAP(startDates, endDates, LAMBDA(start,end,
SUMPRODUCT((startDates <= end) * (endDates >= start ))-1)))/2)
))), HSTACK(namesUx, byRowResult)
)
Here, the output, removing the array as input and using the corresponding range A2:C4. Providing also a graphical representations of the intervals (highlighted) and in cell G2 putting the corresponding previous formula:
Note: Since we are using SUMPRODUCT with a single input, it can be replaced with SUM.

Related

Python: Finding the row index of a value in 2D array when a condition is met

I have a 2D array PointAndTangent of dimension 8500 x 5. The data is row-wise with 8500 data rows and 5 data values for each row. I need to extract the row index of an element in 4th column when this condition is met, for any s:
abs(PointAndTangent[:,3] - s) <= 0.005
I just need the row index of the first match for the above condition. I tried using the following:
index = np.all([[abs(s - PointAndTangent[:, 3])<= 0.005], [abs(s - PointAndTangent[:, 3]) <= 0.005]], axis=0)
i = int(np.where(np.squeeze(index))[0])
which doesn't work. I get the follwing error:
i = int(np.where(np.squeeze(index))[0])
TypeError: only size-1 arrays can be converted to Python scalars
I am not so proficient with NumPy in Python. Any suggestions would be great. I am trying to avoid using for loop as this is small part of a huge simulation that I am trying.
Thanks!
Possible Solution
I used the following
idx = (np.abs(PointAndTangent[:,3] - s)).argmin()
It seems to work. It returns the row index of the nearest value to s in the 4th column.
You were almost there. np.where is one of the most abused functions in numpy. Half the time, you really want np.nonzero, and the other half, you want to use the boolean mask directly. In your case, you want np.flatnonzero or np.argmax:
mask = abs(PointAndTangent[:,3] - s) <= 0.005
mask is a 1D array with ones where the condition is met, and zeros elsewhere. You can get the indices of all the ones with flatnonzero and select the first one:
index = np.flatnonzero(mask)[0]
Alternatively, you can select the first one directly with argmax:
index = np.argmax(mask)
The solutions behave differently in the case when there are no rows meeting your condition. Three former does indexing, so will raise an error. The latter will return zero, which can also be a real result.
Both can be written as a one-liner by replacing mask with the expression that was assigned to it.

Fast way to count duplicates in 30000 rows (Libreoffice Calc)

Actually, I already have a partial answer!!! Conditional formatting with "Cell value is" -> "duplicate" !!!
This way a check is performed for each user's new entry in "real time".
I need to check if duplicate entries exist in 30000 rows of a column (any value, but not blanks!) . I would like to keep track of how many duplicates during the filling process.
Ok, conditional formatting is a very effective visual indication and fast anough for my needs, but as I am not able to perform a loop to check the color of the cells (found some people against this approach!! Would be so easy! ) I need to find an alternative way to count the duplicates (as a whole, no need to identify how many for each case!).
I tryed the formula:
=SUMPRODUCT((COUNTIF(F2:F30001;$F$2:$F$30001)>1))
It works, but it takes two minutes to finish.
If you want to replicate my case. My 30000 entries are formatted as: letter "A" and numbers between 100000 and 999999, e.g., A354125, A214547, etc. Copy as text the result of "=CONCATENATE("A";RANDBETWEEN(100000;999999))" to save time.
Thanks!
PS: Does anybody know the algorithm used to find the duplicates in conditional formatting (it is fast)?
A macro solution is not the best, but is acceptable! ;)
The =SUMPRODUCT((COUNTIF(F2:F30001;$F$2:$F$30001)>1)) must do following: Count if $F$2 is in F2:F30001, then count if $F$3 is in F2:F30001, ..., then count if $F$30001 is in F2:F30001. So it must fully loop over the array F2:F30001 with each single item.
The fastest way counting duplicates in an array is avoiding fully loop over the array with each single item. One way is sorting first. There are very fast quick sort methods. Or using collections which per definition can only have unique items.
The following code uses the second way. The keys of a Collection must be unique. Adding an item having a duplicate key fails.
Public Function countDuplicates(vArray As Variant, Optional inclusive As Boolean ) As Variant
On Error Goto wrong
If IsMissing(inclusive) Then inclusive = False
oDuplicatesCollection = new Collection
oUniqueCollection = new Collection
lCountAll = 0
For Each vValue In vArray
If contains(oUniqueCollection, CStr(vValue)) Then
On Error Resume Next
oDuplicatesCollection.Add 42, CStr(vValue)
On Error Goto 0
Else
oUniqueCollection.Add 42, CStr(vValue)
End If
lCountAll = lCountAll + 1
Next
countDuplicates = lCountAll - oUniqueCollection.Count + IIF(inclusive, oDuplicatesCollection.Count, 0)
Exit Function
wrong:
'xray vArray
countDuplicates = CVErr(123)
End Function
Function contains(oCollection As Collection, sKey As String)
On Error Goto notContains
oCollection.Item(sKey)
contains = True
Exit Function
notContains:
contains = False
End Function
The function can be called:
=COUNTDUPLICATES(F2:F30001, TRUE())
This should return the same result as your
=SUMPRODUCT((COUNTIF(F2:F30001,$F$2:$F$30001)>1))
The optional second parameter inclusive means the count includes all the values which are present multiple times. For example {A1, A2, A2, A2, A3} contains 3 times A2. Counting inclusive means the count result will be 3. Counting not inclusive means the count result will be 2. There is 2 times A2 as a duplicate.
As you see, the function contains much more information than only the count of the duplicates. The oDuplicatesCollection contains each duplicate item. The oUniqueCollection contains each unique item. So this code could also be used for getting all unique items or all duplicate items.

Why is the MATCH function not working how anticipated in Excel with array formulas? (example attached)

File Located Here (using Google Drive)
Explanation and background: I have a list of animals and their color in column G (animal) and H (color). I have a list of the unique colors in column A, and a list of unique animals in column D. In column B, next to the list of unique colors, I need to know of all the animals, which animal has that color most often (raw number, not proportion.) I cannot use any additional helper cells.
I have established the max by animal for each color with the formula {=MAX(COUNTIFS(H:H,A2,G:G,$D$2:$D$20))} with the result being 7, but that's as far as I can get. I can set the COUNTIFS statement to the MAX statement like so: COUNTIFS(H:H,A2,G:G,E1:E19)=MAX(COUNTIFS(H:H,A2,G:G,E1:E19)) which can be used as a TRUE/FALSE array in an array formula. Finally I try to use MATCH with the previous formula as my array, and look for a TRUE value to attempt to retrieve the position of the only TRUE value in the array, but it seems to not be able to find it and instead gives me 19, which is the length of the entire array.
When I step through the formula, here's the step right before resulting in 19:
step before final result
Why does this MATCH not work?
put this in B2 :
=CHOOSE(IF(MAX(COUNTIFS(G:G,$E$1,H:H,A2),COUNTIFS(G:G,$E$2,H:H,A2),COUNTIFS(G:G,$E$3,H:H,A2),COUNTIFS(G:G,$E$4,H:H,A2),COUNTIFS(G:G,$E$5,H:H,A2),COUNTIFS(G:G,$E$6,H:H,A2),COUNTIFS(G:G,$E$7,H:H,A2),COUNTIFS(G:G,$E$8,H:H,A2),COUNTIFS(G:G,$E$9,H:H,A2))>MAX(COUNTIFS(G:G,$E$10,H:H,A2),COUNTIFS(G:G,$E$11,H:H,A2),COUNTIFS(G:G,$E$12,H:H,A2),COUNTIFS(G:G,$E$13,H:H,A2),COUNTIFS(G:G,$E$14,H:H,A2),COUNTIFS(G:G,$E$15,H:H,A2),COUNTIFS(G:G,$E$16,H:H,A2),COUNTIFS(G:G,$E$17,H:H,A2),COUNTIFS(G:G,$E$18,H:H,A2),COUNTIFS(G:G,$E$19,H:H,A2)),1,2),CHOOSE(IF(MAX(COUNTIFS(G:G,$E$1,H:H,A2),COUNTIFS(G:G,$E$2,H:H,A2),COUNTIFS(G:G,$E$3,H:H,A2),COUNTIFS(G:G,$E$4,H:H,A2))>MAX(COUNTIFS(G:G,$E$5,H:H,A2),COUNTIFS(G:G,$E$6,H:H,A2),COUNTIFS(G:G,$E$7,H:H,A2),COUNTIFS(G:G,$E$8,H:H,A2),COUNTIFS(G:G,$E$9,H:H,A2)),1,2),CHOOSE(IF(MAX(COUNTIFS(G:G,$E$1,H:H,A2),COUNTIFS(G:G,$E$2,H:H,A2))>MAX(COUNTIFS(G:G,$E$3,H:H,A2),COUNTIFS(G:G,$E$4,H:H,A2)),1,2),IF(COUNTIFS(G:G,$E$1,H:H,A2)>COUNTIFS(G:G,$E$2,H:H,A2),"bat","raccoon"),IF(COUNTIFS(G:G,$E$3,H:H,A2)>COUNTIFS(G:G,$E$4,H:H,A2),"bear","goat")),CHOOSE(IF(MAX(COUNTIFS(G:G,$E$5,H:H,A2),COUNTIFS(G:G,$E$6,H:H,A2))>MAX(COUNTIFS(G:G,$E$7,H:H,A2),COUNTIFS(G:G,$E$8,H:H,A2),COUNTIFS(G:G,$E$9,H:H,A2)),1,2),IF(COUNTIFS(G:G,$E$5,H:H,A2)>COUNTIFS(G:G,$E$6,H:H,A2),"moose","turtle"),CHOOSE(IF(COUNTIFS(G:G,$E$7,H:H,A2)>MAX(COUNTIFS(G:G,$E$8,H:H,A2),COUNTIFS(G:G,$E$9,H:H,A2)),1,2),"squirrel",IF(COUNTIFS(G:G,$E$8,H:H,A2)>COUNTIFS(G:G,$E$9,H:H,A2),"snake","bird")))),CHOOSE(IF(MAX(COUNTIFS(G:G,$E$10,H:H,A2),COUNTIFS(G:G,$E$11,H:H,A2),COUNTIFS(G:G,$E$12,H:H,A2),COUNTIFS(G:G,$E$13,H:H,A2),COUNTIFS(G:G,$E$14,H:H,A2))>MAX(COUNTIFS(G:G,$E$15,H:H,A2),COUNTIFS(G:G,$E$16,H:H,A2),COUNTIFS(G:G,$E$17,H:H,A2),COUNTIFS(G:G,$E$18,H:H,A2),COUNTIFS(G:G,$E$19,H:H,A2)),1,2),CHOOSE(IF(MAX(COUNTIFS(G:G,$E$10,H:H,A2),COUNTIFS(G:G,$E$11,H:H,A2))>MAX(COUNTIFS(G:G,$E$12,H:H,A2),COUNTIFS(G:G,$E$13,H:H,A2),COUNTIFS(G:G,$E$14,H:H,A2)),1,2),IF(COUNTIFS(G:G,$E$10,H:H,A2)>COUNTIFS(G:G,$E$11,H:H,A2),"cat","dog"),CHOOSE(IF(COUNTIFS(G:G,$E$12,H:H,A2)>MAX(COUNTIFS(G:G,$E$13,H:H,A2),COUNTIFS(G:G,$E$14,H:H,A2)),1,2),"rabbit",IF(COUNTIFS(G:G,$E$13,H:H,A2)>COUNTIFS(G:G,$E$14,H:H,A2),"sheep","cow"))),CHOOSE(IF(MAX(COUNTIFS(G:G,$E$15,H:H,A2),COUNTIFS(G:G,$E$16,H:H,A2))>MAX(COUNTIFS(G:G,$E$17,H:H,A2),COUNTIFS(G:G,$E$18,H:H,A2),COUNTIFS(G:G,$E$19,H:H,A2)),1,2),IF(COUNTIFS(G:G,$E$15,H:H,A2)>COUNTIFS(G:G,$E$16,H:H,A2),"chicken","llama"),CHOOSE(IF(COUNTIFS(G:G,$E$17,H:H,A2)>MAX(COUNTIFS(G:G,$E$18,H:H,A2),COUNTIFS(G:G,$E$19,H:H,A2)),1,2),"pig",IF(COUNTIFS(G:G,$E$18,H:H,A2)>COUNTIFS(G:G,$E$19,H:H,A2),"horse","deer")))))
then drag until B10.
Key things that I learnt here :
Choose() is a good alternative to heavily nested if(). It had somehow helped me to 'not-getting-lost' in the formula breakdown.
Cascaded binary evaluation is a good way to break a list of repeated evaluation.
"I cannot use any additional helper cells." <-- If the OP don't mind, you may always use a helper Sheet instead. Putting this requirement really pushed the limit of excel formula. My 1st version of the solution needs > 10000 characters and it is out excel's 8192 characters limits.
Try this array formula in B2 then copy to B3:B10:
= INDEX( $G$1:$G$510,
MATCH( MAX(
IF( $H$1:$H$510 = A2,
COUNTIFS( $G$1:$G$510, $G$1:$G$510, $H$1:$H$510, $H$1:$H$510), "" ) ),
IF( $H$1:$H$510 = A2,
COUNTIFS( $G$1:$G$510, $G$1:$G$510, $H$1:$H$510, $H$1:$H$510), "" ), 0 ) )

x and y lengths differ in an apply

So I'm trying to run an apply function over an array. The idea is to look at the value in the risk factor column and if this is 1, use "OnsetFunction" and if it's a zero to use the HighOnsetFunction. The would then produce a column of values which populates another column in array.
> apply(OutComes, 1, function(x) { if(x["Risk_Factor"] == 1)
> + {OnsetFunction()}
> + else{ HighOnsetFunction()}})
I'm having trouble with the apply function above and keep getting this message.
>Error in xy.coords(x, y) : 'x' and 'y' lengths differ
There are only five rows in the array at the moment as I'm trying to make sure the code works on a small group before I extend it to be many people, but I'm not sure what the x and y are. I've seen this message with graphs, but never with this before.
I think you are trying to use ifelse but using apply and an if
Try:
ifelse(OutComes$Risk_Factor==1, OnsetFunction(), HighOnsetFunction())

Excel nested if array

I am doing a nested min if array, and am having issue with it reading blanks.
=MIN(IF(Sheet1!$C:$C<=A24,IF(Sheet1!$AE:$AE>A24,Sheet1!$C:$C),IF(Sheet1!$C:$C<=A24,IF(Sheet1!$AE:$AE="",Sheet1!$C:$C))))
So in English, I'm asking that if the dates in sheet 1 column C are less than or equal to the value in A24, and the date in sheet 1 column AE is after the date in A24, OR the value in sheet 1 column AE is blank, give me the earliest date of what's left from column C. I hope that makes sense!
Any help would be greatly appreciated as I have spent literally hours on this trying isblanks, further nested if, all with no joy.
If you have Excel 2010 this AGGREGATE() Function will work:
=AGGREGATE(15,6,(Sheet1!C:C/((Sheet1!C:C<=A24)*((Sheet1!AE:AE>A24)+(Sheet1!AE:AE="")))),1)
If you have 2007 or earlier then the array formula should do it:
=MIN(IF(((Sheet1!C:C<=A24)*((Sheet1!AE:AE>A24)+(Sheet1!AE:AE=""))),Sheet1!C:C))
Being and array it must be entered with Ctrl-Shift-Enter instead of just Enter or Tab.
The issue is when using the or in arrays one should use +. So now if (Sheet1!AE:AE>A24) or (Sheet1!AE:AE="") are true it will return a 1 because 0+1=1.
Where the and portion is * because 0 * 1= 0. So both would need to be true to return the 1, or true.
Do you have to use nested if's?
Why not have the if statement evaluate all three conditions. An OR function to check two conditions (check for blanks OR value in col AE is greater an A24. Then, have the result of the OR function feed into an AND function along with the third check (column C less than or equal to A24). Then if the entire logical check of the IF function returns TRUE, use a MIN function to give you the minimum value in Column C. Likewise, if the logical check of the IF function evaluates to False,
do something else (ex. "No Match")
something like this:
=(IF(AND(COL_C <= $A$24, OR(COL_AE > $A$24, ISBLANK(COL_AE))),MIN(COL_C,"No Match"))

Resources