I am struggling with the following formula, it works for some scenarios but not in all of them. The name input has the data set that is failing, getting an #CALC! error with the description "Nested Array":
=LET(input, {"N1",0,0;"N1",0,10;"N1",10,20},
names, INDEX(input,,1), namesUx, UNIQUE(names), dates, FILTER(input, {0,1,1}),
byRowResult, BYROW(namesUx, LAMBDA(name,
LET(set, FILTER(dates, names=name),
startDates, INDEX(set,,1), endDates, INDEX(set,,2), onePeriod, IF(ROWS(startDates)=1, TRUE, FALSE),
IF(onePeriod, IF(startDates <= IF(endDates > 0, endDates, startDates + 1),0, 1),
LET(seq, SEQUENCE(ROWS(startDates)),
mapResult, MAP(startDates, endDates, seq, LAMBDA(start,end,idx,
LET(incIdx, 1-N(ISNUMBER(XMATCH(seq,idx))),
startInc, FILTER(startDates, incIdx), endInc, FILTER(endDates, incIdx),
MAP(startInc, endInc,LAMBDA(ss,ee, N(AND(start <= ee, end >= ss))))
))),
SUM(mapResult)))
))), HSTACK(namesUx, byRowResult)
)
If we replace the input values in previous formula with the following range: A2:C4, in G1:H1 would be the expected output:
Provided also a graphical representation to visualize the intervals and their corresponding overlap. From the screenshot, we have 2 overlaps.
If we use the above formula for the same range we get the following output:
If we hover the #CALC! cell, it informs about the specific error:
Let's explain the input data and what the formula does:
Input data
First column: N1, N2, N3, represents names
Second Column: Start of the interval (I am using numeric values, but in my real situation will be dates)
Third Column: End of the interval (I am using numeric values, but in my real situation will be dates)
Formula
The purpose of the formula is to identify for each unique names, how many intervals overlap. The calculation goes by each row (BYROW) of the unique names and for each pair of start-end values, counts the overlaps with respect to the other start-end values. I use FILTER to exclude the current start-end pair with the following condition: FILTER(startDates, incIdx) and I tested it works properly.
The condition to exclude the start data of the current name of the iteration of BYROW is the following:
1-N(ISNUMBER(XMATCH(seq,idx)))
and used as second input argument of the FILTER function.
The rest is just to check the overlap range condition.
I separate the logic when a name has only one interval, from the rest because the calculation is different, For a single interval I just want to check that the end date comes after start date and treat the special case of 0. This particular case I tested it works.
Testing and workarounds
I already isolated where is the issue and when it happens. The problem happens in the following call:
MAP(startInc, endInc,LAMBDA(ss,ee, N(AND(start <= ee, end >= ss))))
when startInc and endInc has more than one row. It has nothing to do with the content of the LAMBDA function. I can use:
MAP(startInc, endInc,LAMBDA(ss,ee, 1))
and still fails. The problem is with the input arrays: startInc, endInc. If I use any other array for example the following ones it doesn't works:
MAP(seq,LAMBDA(ss, 1))
Similar result using names, startDates, etc, even if I use: {1;2;3} fails. If use use idx it works, because it is not an array. Therefore the error happens with any type of array or range.
I have also tested that the input arguments are correct having the correct shape and values. For example replacing the MAP function with: TEXTJOIN(",",, startInc)&" ; " (and also with endInc) and replacing SUM with CONCAT to concatenate the result.
In terms of input data I tested the following scenarios:
{"N1",0,0;"N1",0,10} -> Works
{"N1",0,0;"N1",0,10;"N2",10,0;"N2",10,20;"N3",20,10} -> Works
{"N1",0,0;"N1",0,10;"N1",10,20} -> Error
{"N1",0,0;"N1",0,10;"N1",10,0} -> Error
{"N1",0,0;"N1",0,10;"N1",10,0;"N1",20,10} -> Error
{"N1",0,0;"N1",0,10;"N2",10,0;"N2",10,20;"N2",20,10} -> Error
The cases that work are because it goes to the MAP function an array of size 1 (number of duplicated names is less than 3)
I did some research on internet about #CALC! error, but there is no too much details about this error and it is provided only a very trivial case. I didn't find any indication in the limit of nested calls of the new arrays functions: BYROW, MAP, etc.
Conclusion, it seems that the following nested structure produce this error:
=MAP({1;2;3}, LAMBDA(n, MAP({4;5;6}, LAMBDA(s, TRUE))))
even for a trivial case like this.
On contrary the following situation works:
=MAP({1;2;3}, LAMBDA(n, REDUCE("",{4;5;6}, LAMBDA(a,s, TRUE))))
because the output of REDUCE is not an array.
Any suggestion on how to circumvent this limitation in my original formula?, Is this a real situation of an array that cannot use another array as input?, Is it a bug?
As #JosWoolley pointed out:
LAMBDA's calculation parameter should return a single value and not an
array
I haven't seen that way, or deduced it from #CALC! Nested Array error definition:
The nested array error occurs when you try to input an array formula
that contains an array. To resolve the error, try removing the second
array...For example, =MUNIT({1,2}) is asking Excel to return
a 1x1 array, and a 2x2 array, which isn't currently supported.
=MUNIT(2) would calculate as expected
so the alternative is then to remove this second MAP call. The following link gave me an idea about how to do it: Identify overlapping dates and times in Excel, therefore using SUMPRODUCT or SUM can serve the purpose.
=LET(input, {"N1",0,0;"N1",0,10;"N1",10,20},
names, INDEX(input,,1), namesUx, UNIQUE(names), dates, FILTER(input, {0,1,1}),
byRowResult, BYROW(namesUx, LAMBDA(name,
LET(set, FILTER(dates, names=name),
startDates, INDEX(set,,1), endDates, INDEX(set,,2),
onePeriod, IF(ROWS(startDates)=1, TRUE, FALSE),
IF(onePeriod, IF(startDates <= IF(endDates > 0, endDates, startDates + 1),0, 1),
LET(seq, SEQUENCE(ROWS(startDates)),
mapResult, MAP(startDates, endDates, seq, LAMBDA(start,end,idx,
LET(incIdx, 1-N(ISNUMBER(XMATCH(seq,idx))),
startInc, FILTER(startDates, incIdx), endInc, FILTER(endDates, incIdx),
SUMPRODUCT((startInc <= end) * (endInc >= start ))
))),SUM(mapResult)))/2
))), HSTACK(namesUx, byRowResult)
)
We need to divide by 2 the result, because we are counting the overlapping in both directions. A overlaps with B and vice versa.
It can be further simplified because there is no need to build the names: startInc, endInc to exclude the range itself we are checking for overlap. We can include it and subtract one overlap. This is the way to do it:
=LET(input, {"N1",0,0;"N1",0,10;"N1",10,20},
names, INDEX(input,,1), namesUx, UNIQUE(names), dates, FILTER(input, {0,1,1}),
byRowResult, BYROW(namesUx, LAMBDA(name,
LET(set, FILTER(dates, names=name),
startDates, INDEX(set,,1), endDates, INDEX(set,,2),
onePeriod, IF(ROWS(startDates)=1, TRUE, FALSE),
IF(onePeriod, IF(startDates <= IF(endDates > 0,
endDates, startDates + 1),0, 1),
SUM(MAP(startDates, endDates, LAMBDA(start,end,
SUMPRODUCT((startDates <= end) * (endDates >= start ))-1)))/2)
))), HSTACK(namesUx, byRowResult)
)
Here, the output, removing the array as input and using the corresponding range A2:C4. Providing also a graphical representations of the intervals (highlighted) and in cell G2 putting the corresponding previous formula:
Note: Since we are using SUMPRODUCT with a single input, it can be replaced with SUM.
For example I have a cell array like:
Column1----Column2
'aaaa'--------4
'bbbb'--------5
'cccc'---------2
'cccc'---------0
'dddd'--------0
'dddd'--------3
'eeee'--------0
'ffff'-----------0
And what I want is to merge the rows which has same elements. Finally what I want to obtain is:
'aaaa'--------4
'bbbb'--------5
'cccc'---------2
'dddd'--------3
'eeee'--------0
'ffff'-----------0
I'm looking for an answer without for loops.
Find all completely unique strings (i.e ffff_0 and ffff__1 are unique, but aaaa_1 and aaaa___1 are obviously not unique. (apparently underscores represent formatting?)
Once you have that, do the same thing with just the letters.
I am pretty sure you will have to do that (above) in some capacity to get your desired output, and if that's the case, I think you are right on the edge of the speed tradeoff between for loops and all that extra memory allocation and sorting through values finding unique ones.
Try this:
arr([arr{:,2}] ~= 0,:)
arr -
rows: all rows of arr such that the second column does not equal 0,
columns: all columns
Might be a syntax error in there somewhere, been a while since I used Matlab...
Edit: New answer
non_zero = transpose([arr{:,2}] ~= 0);
arr = arr(non_zero | ~ismember(arr(:,1),arr(non_zero,1)),:)
Essentially what I'm doing: Get all rows such that the right hand side is not zero, OR the left-hand side is not a member of the left-hand sides of the non-zero rows. The latter condition will only be satisfied for rows with zero with no matching left-hand side in the non-zero rows (and hence not repeats). Now keep in mind that this will still not work if you have any duplicate rows (same left and right-hand side). If that's a possibility then do this:
non_zero = transpose([arr{:,2}] ~= 0);
arr = arr(non_zero | ~ismember(arr(:,1),arr(non_zero,1)),:);
[~,U] = unique(arr(:,1));
arr = arr(U,:)