Excel - Retrieve value in an array - arrays

In my organisation, there are a couple of excel functions that return large array like more than 2000 rows and several columns.
Dummy Code / Dummy Example :
{=FunctionThatReturnArray(param1)}
where param1 is the date
I need to retrieve the selling price for the combination « Shoes » « Yellow » for different dates.
I don’t want to display an entire array, for every date I’m interested in.
Instead, I would like to display only the value that I need.
I tried to used the Index function as below, but as the combination Shoes/Yellow isn’t always at the fifth row, it doesn’t work.
{=INDEX(FunctionThatReturnArray(param1),5,4)}
where 5 is the RowNumber and 4 the ColNumber
I believe I need to use the Match Function somehow, but on the 2 different column.
How could I do that without displaying the entire array on my worksheet ?
Thanks in advance and kind regards
Largo

I'm not sure if this is the answer you want, but you are right, you would have to match on both the 2nd and 3rd columns then index into the 4th column like this:
=INDEX({43466,"Pants","Yellow",40;43466,"Shirt","Green",20;43466,"Shoes","Blue",70;43466,"Shoes","Yellow",75},MATCH(1,
(INDEX({43466,"Pants","Yellow",40;43466,"Shirt","Green",20;43466,"Shoes","Blue",70;43466,"Shoes","Yellow",75},0,2)=B2)*
(INDEX({43466,"Pants","Yellow",40;43466,"Shirt","Green",20;43466,"Shoes","Blue",70;43466,"Shoes","Yellow",75},0,3)=C2),0),4)
Where I have used an array constant to test it, you would need multiple references to your Array Function to achieve this.
BTW the 43466 is the number representation of 1/1/2019.

Related

Arrays, SUM + INDEX/MATCH

Note: tried in Excel and Google Sheets, but I have a preference for Sheets.
Basically I want to get the sum of a group of data using INDEX and MATCH (because the parameters are going to be drop-down dependent):
The desired result is:
So this will require a few things:
Converting the cell D13(April) to a Month
Converting the "weekof" column to a Month
Using INDEX and MATCH and MATCH again, I'm assuming because it's multiple cell references.
Here's my solution currently below:
=SUM(INDEX(D5:I9, MATCH(MONTH(D13&1),ARRAYFORMULA(MONTH(C5:C9)),0), MATCH(E12,D4:I4,0)))
This returns the NEAREST value:
270
Instead of:
804
Why this value?
270+500+34 = 804
If you are not strict to use INDEX and MATCH, you may use the following solution:
Add extra column name it "Month", this column will extract the month name from the date column using TEXT function as the following:
=IF(C3<>"",TEXT(C3,"mmmm"),"")
The if statements ensures that only filled dates will have a month value, since you have to fill this column with the above formula for a certain amount of cells.
Now you can simply use the SUMIF function in cell E13 or where ever you want:
=SUMIF(B:B,D13,D:D)
If you don't want the Month column to appear within your data table you may put it at the end of your table and hide it.
You could directly use FILTER then SUM the result instead to simplify your formula to this one:
Formula:
=SUM(FILTER(D:D, TEXT(C:C,"MMMM") = E13))
Output:
UPDATE:
The above formula should also update when the value is dropdown. Dropdown is just data that can be changed with predetermined values, aside from that, it should be the same when using a normal cell.
To match columns, use MATCH and INDEX together with the formula above. See modified formula below.
Be careful of the circular dependency. make sure your ranges doesn't interfere with the actual cell where you put your formula.
Column Matching:
=SUM(INDEX(FILTER(D:E, TEXT(C:C, "MMMM") = E13),,MATCH(F12, D4:4, 0)))
You can use pivot table and group dates by year and month.

Long calculation times with XLOOKUP vs INDEX-MIN-COLUMN

I'm using this formula =IF(B24="","",IFERROR(INDEX(Sheet3!$C$3:$EE$3,,MIN(IF(Sheet3!$C$4:$EE$23=(Sheet2!C24&$K$18),COLUMN(Sheet3!$C:$EE)))-2),"NF")) to return a cell value in the top row of an array - a date in this case.
The search criteria is a combination of a unique project number and a 2 digit status alphanumerical code for the project. The array consists of 23 rows where combinations of the unique numbers are found, each with different status codes.
So essentially, I'm building a FILTERED project status dashboard that returns dates linked to the relevant project status.
The code above is inspired from ( LINK ) that uses a very similar layout, but it uses town suburbs linked to postal codes instead of project numbers and status codes. The formula works well (though, not entered as an array formula), but I don't have a single formula in the sheet, I have 3 300 occurrences of this formula.
The problem comes in when the user changes the FILTER - Excel recalculates the entire dashboard and that takes anywhere from 2 to 5 minutes to run. You hit the escape button and cancel the calculation after setting the filter, but Excel just starts calculating again after a few seconds. After that, Excel's response is sluggish and almost unusable. Yes - our hardware is pretty weak ...
I tried XLOOKUP as well, but can't set the "lookup_array" to an array ( Sheet3!$C$4:$EE$23 ) because it doesn't match the "return-array" ( Sheet3!$C$3:$EE$3 ) Concatenating the lookup arrays with & works, but then you'd have to do that for all 23 rows, and again, multiply that by 3 300.
I thought of creating a UDF, but the function will still be called every time Excel recalculates after filtering... 3 300 calls ...
Any ideas on how to make the INDEX version run faster, or make the XLOOKUP accept the lookup_array as Sheet3!$C$4:$EE$23 in the hopes that it'll run faster?
Thank you!
Not really an elegant solution, but it works.
I imported the dataset into a helper sheet, where I combined the cell value with the corresponding value in Column A for each row ( a name in this case ) and the date from row 1 for each column, using underscore as a delimiter.
This new data range was then given a unique name, EE in this case.
On a second helper sheet, using this formula =INDEX(Filtered,1+INT((ROW('Sheet1'!C3)-1)/COLUMNS(Filtered)),MOD(ROW('Sheet1'!C3)-1+COLUMNS(Filtered),COLUMNS(Filtered))+1) and drag it down till it returns an REF! error and going back one row before the error.
This transposes all the data into a single column G. Using =UNIQUE(SORT(FILTER(B3:B3240,B3:B3240<> "",""))) then gives me a filtered list of unique values in column H that I then run
=IF(H3="","",LEFT(H3, SEARCH("_",H3,1)-1)) for the first data value in I, and
=IF(H3="","",MID(H3, SEARCH("_",H3) + 1, SEARCH("_",H3,SEARCH("_",H3)+1) - SEARCH("_",H3) - 1)) for the middle data value in J, and
=IF(H3="","",IFERROR(TEXT(RIGHT(H3,5),"yyyy-mm-dd"),"NF")) for the last data value in K.
Then just run XLOOPUP across columns I, J and K.
Runs quick and easy and solves a few of the other issue I had as well.
The second data set has just over 35 000 rows - still works well and fast.

How to insert a formula inside an array?

I found this amazing formula:
=SUM(INDEX(data,N(IF(1,{1,3,5})))) by Jeff Weir.
But I need to reference the numbers inside an array with some formula. Here is an example:
=SUM(INDEX(data,N(IF(1,{Rows(A4:A7),3,5}))))
Excel does not want to execute this. What can I do?
My problem is more complicated in reality, but this info will help me a lot (I think).
//EDIT
My goal is for Excel to return me the "Price" of a certain "Name" (let's say Oil), within a certain "Date" range (let's say within March 2020). Also, it has to be the latest date in this range. So, in this case, I want Excel to return me the price "80" (the price of Oil), as 20.3.2020 is later than 2.3.2020. Imagine I have a lot of data, with a lot of names, dates, and prices.
Use CHOOSE to return the array:
=SUM(INDEX(data,N(IF(1,CHOOSE({1,2,3},ROWS(A4:A7),3,5)))))
Depending on one's version this formula may require the use do Ctrl-Shift-Enter instead of Enter when exiting edit mode.

VLOOKUPs to Populate an Excel Table

I'm having trouble writing a VLOOKUP to sort some data.
I have one table that has data that looks like this:
MarkAsOfDate MaturityDate ZeroRate
05-May-15 05-May-15 0.006999933
05-May-15 06-May-15 0.006999933
05-May-15 05-Jun-15 0.008996562
05-May-15 06-Jul-15 0.008993128
... ....
I want to make a table with every instance where the interval between the dates in the first and second columns is exactly one month (such as 05 - May - 15 and 05 - Jun - 15), and with blanks where no such value exists.
So I made a second table which looks like:
MarkAsofDate MaturityDate Zero Rate 1M
5-May-15 5-Jun-15
6-May-15 6-Jun-15
7-May-15 7-Jun-15
8-May-15 8-Jun-15
9-May-15 9-Jun-15
.... ....
I want to populate this table using data from the first table. I've tried to write a VLOOKUP for it but I'm not sure how to do it with two columns instead of one.
Thanks in advance.
Vlookup has some limitations, as you're starting to see. Another suggestion is Index/Match. Use this in your second table.
Note: I assume your top table is in Sheet1. Put this in your C2, in the second table (under the "Zero Rate 1M" header):
=Index(Sheet1!$C:$C,match(A2&B2,Sheet1!$A:$A&Sheet1!$B:$B,0)) and enter by pressing CTRL+SHIFT+ENTER.
There are a lot of ways to do it - if your data is sorted first by MarkAsofDate and then by Maturity date, the simplest method will be to add a helper column on your raw data tab - let's say column E. In column E, starting at E2 and copied down, type [Assuming MarkAsofDate is column A & MaturityDate is column E]:
=IF(MONTH(B1)-MONTH(A1)=1,A1,"")
This column will show the MarkAsOfDate for every item which has a MaturityDate 1 month after the MarkAsOfDate; for all other rows it will show "".
In your special data results tab, use the MATCH function to find the row in column E which matches your current row's MarkAsOfDate. We will also use the INDEX function to return the value from that row in column C. Assuming your first sheet with raw data is sheet1 and your special data results is on sheet2, type this into E2 on sheet2 and drag down:
=INDEX(Sheet1!C:C,MATCH(A2,Sheet1!A:A,0))
Another alternative (apart from BruceWayne's recommended Array Formula) would be to use the OFFSET function. OFFSET creates a new range based on a starting point, moving a number of cells to the right/left/up/down, for a given height and width. In this case, we will first use MATCH to find the first time that the MarkAsOfDate on Sheet1 matches Sheet2. We will use that info and the OFFSET function to create a new range which starts there, and ends at the bottom of your data, like so:
=OFFSET(Sheet1!A1,MATCH(A2,Sheet1!A:A,0),1,COUNT(Sheet1!C:C),2)
Then we just need to use VLOOKUP on the range we created above, like so:
=VLOOKUP(B2,OFFSET(Sheet1!A1,MATCH(A2,Sheet1!A:A,0),1,COUNT(Sheet1!C:C),2),0)
This second alternative avoids needing a helper column, but is more complex and could be prone to errors if your rows/columns change (because we had to hardcode a couple of things in the OFFSET function). Also, OFFSET is volatile, meaning it recalculates whenever any cell calculates, so it can slow down your workbook if you have a lot of rows of it. Based on that, I recommend you either use the helper column method above, or if you are comfortable with Array Formulas, using BruceWayne's answer.

couchdb map / reduce multiple keys filtering by date

I have a view setup with a map reduce. Right now this code works great:
function(doc) {
if (doc.type == 'test'){
if(doc.trash != 1){
for (var id in doc.items) {
emit([id,doc.items[id].name], 1);
}
}
}
}
function(keys,prices){
return (keys, sum(prices));
}
I get a return and when using the group parameter, it condenses everything just fine.
My issue/question, I want to add a third key.... DATE, so I may only reduce records from certain dates. So for example:
function(doc) {
if (doc.type == 'test'){
if(doc.trash != 1){
for (var id in doc.items) {
emit([date,id,doc.items[id].name], 1);
}
}
}
}
My issue is that since date is at the beginning of the array, the reduce groups by date, id etc. I know I use group_level and say just take the first key from the array or the first 2 keys, but that doesn't help either because afaik, group_level goes from left to right in the array. I could put the date on the end of the emit array, but that doesn't help either because I need to have values at the beginning of my startkey and endkey to search on.
Here is an example of the output of data:
{"key":["2012-03-13","356752b8a5f6871f3","Apple"],"value":1},
{"key":["2012-03-20","123752b8a76986857","Pear"],"value":1},
{"key":["2012-04-12","3013531de05871194","Grapefruit"],"value":1},
{"key":["2012-04-12","356752b8a5f6871f3","Apple"],"value":1},
I want APPLE to be added up in one row, here it's adding up apples by date first. I was able to successfully just add up all the apples if I remove DATE as the first key in the array, but then I can't search by date range.
Any ideas on how to accomplish this?
If I correctly understand what you want to do, then you'd want to put the date as the first element of your array, and use group_level as well as start_key and end_key.
Eg. startkey=[1, "someid"] endkey=[1,"someid",{}] group_level=2
Will get you all items from date 1 (obviously choose your own format here), with id "someid" and any name. It seems funny that you emit id's before names, and without having more information about what you're actually trying to accomplish, it's hard to advise your general data model. If ID is a "type" id meaning that many items share the same ID then this makes sense. If ID is a unique per item ID, then it does not. In that case, you'd want to emit "name" before ID...
Edit 1
As per your comment, to do a range of dates you do this:
startkey=[1] endkey=[5,{}] group_level=2
You will get everything from date 1 to date 5 grouped by id ie. apples, oranges etc. I use this exact technique in a very large scale production application. I actually formatted the dates as an easily human readable integers of the format yyyymmdd, so 20140624 would sort to the top. If I want everything from the start of the month till now grouped by my group ids, I call
startkey=[20140601] endkey=[20140624,{}] group_level=2
It works perfectly and as far as I can tell that's what you're looking to do. I also have a third key layer "detail" which allows me to provide a deeper level of grouping for items that need it. I can then call
startkey=[20140601, "someid"] endkey=[20140624, "someid",{}] group_level=3
To drill to the detail level for a particular id, or just use the previous query with group_level=3 if I want the details for every id. I'm certain you can make this work - I've solved this exact problem in a production application using the techniques described.
Edit 2
If you want to group all apples regardless of date, then you'll need to let apples be the first element in the key. You can then get all apples over all time as a single row in the view result using group_level=1, and Apples over a date range using group_level=2. The difference here is that you'll only be able to do the group_level=2 query on a single item type at a time. If you want the best of both worlds, you unfortunately just need to make 2 views. That's just how key ordering works... If you need fast response times for both types of queries, all item types over a date range, and all of a particular item not grouped by date, I believe 2 views is the only way to achieve that.
Note
Another thing to note is about your reduce function. Wherever possible it is highly recommended that you use the built in reduce functions. They're implemented in erlang and are highly optimized compared to custom javascript reduce functions.
In your case, just replace your reduce function with this
_sum
Easy hey?
If you post more info about your application, data model etc. then I'd be happy to help out more with your database design.

Resources