VLookup style data structure - arrays

Problem:
I'm looking for an efficient data structure in VBA, which allows me to lookup a value in one 'column' and find a corresponding value in another column. All columns have the same fixed length.
Background
Essentially I have 2 Enums, each with n items, and an array of n strings; I'd like to pass the ith value from any of these sets, and return the ith value from another specified set
One option would be a Collection of Arrays; the Collection would have keys corresponding to the type of list (e.g. Enum1, Enum2, StringList), and I would be able to make a function which takes two list keys and a lookup value as argument, and returns the corresponding value in the second column with a loop:
Function findCorresponding(dataTable As Collection, header1 As String, header2 As String, lookupVal As Variant) As Variant
Set array1= dataTable(header1) 'pick out array from collection
For i = Lbound(array1) To Ubound(array1) 'loop through to find lookup val
If array1(i) = lookupVal Then Exit For
Next i
findCorresponding = dataTable(header2)(i) 'return corresponding val
End Function
And sure, I could replace the lookup arrays with un-Keyed Collections to avoid looping. But that doesn't seem like the most efficient way (I believe dictionaries Hash rather than loop, so would be faster on that front, but have a lot of extra baggage compared to an array)
What I really want is something like a Scripting.Dictionary, where you can access both values and keys, and use one to get the other. But with a third parameter that can be found using either of the other two, and can be used to find either of the other two.
If something extends to n columns that would also be useful

Related

How to iterate through a 2D Array with an inner set of non-fixed length?

While trying to assign unique values along a 2D array I'm lacking of an idea how to appropriately do this.
I'll have an 2D array [1..N_SECTIONS] of var set of int: content and a set of unique elements items that need to be distributed in content by a certain rule, in a way that in the end they still don't occur more than once in any of the individual sets of content.
I tried to solve this over a nested forall however I failed due to I don't see a way to append new elements to the individual sets of an unkown size. Same goes if I would convert this structure to a real 2D array like array[1..N_SECTIONS,1..N_ITEMS] of int: content basically because each section can have a different amount of members. Additionally this way the array is much oversized since N_SECTIONS * N_ITEMS >> N_ITEMS (i.e. the unqiue elements to assign).
Unfortunately I couldn't find a neat solution to constrain such a distribution in the official docs.
Thanks for any hints in advance!
Update:
Here's a dataset (Note: the assignment rule here does not guarantee a unique representation of each element, it is just used for demonstration)
array[1..5] of int: items = [1,2,3,4,5];
array[1..20] of var set: sections;
constraint forall(cur_sec in 1..3)(
forall(item in items)(
item mod cur_sec == 0 ->
sections[cur_sec][card(sections[cur_sec]] = item)); % this is particular line is not working

Creating an Excel VBA array with unassigned variables as entries

I am trying to set up a workflow/UX where a CSV file is imported and each column of the imported file is assigned a "data type" using dropdowns at the top of each column. Once these data types are designated/assigned to each column, another macro populates a second sheet with the imported CSV data, where the location in the new sheet is dependent on the data type designated for each column of the imported data.
For example, if the first column of the imported data is of data type "DataA", the dropdown selection would be selected as such for this first column (from a total of 12 "data types" in the dropdown menu). This "DataA" data would then be populated in the second sheet in its fifth column.
Here is the code I have so far:
Dim DataA As Integer, DataB As Integer, DataC As Integer, DataD As Integer, DataE As Integer, DataF As Integer, DataG As Integer, DataH As Integer, DataI As Integer, DataJ As Integer, DataK As Integer, DataL As Integer
Dim ColArray(12) As Variant
For p = 1 To LastColImport 'This is a previously-defined/assigned variable
q = 1
Do While q <= 12
If ActiveSheet.DropDowns(p).Value = q Then
ColArray(q) = p
Exit Do
Else
q = q + 1
End If
Loop
Next p
This populates the ColArray array with an integer entry if the data type is selected, or an empty entry if it has not been selected. The next step I want to do is assign each ColArray entry value to a named variable, so that I can call the ColArray entry values by data type name instead of having to remember or look up what data type each ColArray integer value refers to.
I can't find a built-in "dropdown list range name" recall function anywhere, so what I would like to do is the following:
Dim ColArrayNames(12) As Variant
ColArrayNames(1) = DataA 'These variables were defined in the previous code block
ColArrayNames(2) = DataB
...
ColArrayNames(12) = DataL
ColArrayNames = ColArray
I realize that in this specific case, it would probably be easier to just assign the data type variables directly to the ColArray values, instead of putting them into an array and then equating the array values. I feel like populating an array with unassigned variables could be useful in other cases as well. My attempts at using this method of assigning variables have failed.
After changing the last line of code to:
For i = 1 to 12
ColArrayNames(i) = ColArray(i)
Next i
The ColArray values don't get assigned to the data type variables. That being said, the ColArrayNames entries are assigned the correct values, so the issue seems to be the "last step" in assigning the ColArray values to the data type variables by way of the ColArrayNames array of unassigned variables.
If anyone has suggestions for how to approach this "general" problem of using arrays of unassigned variables to assign values to each array entry (while preserving the ability to call these values using the entries' "original" variable names), or if there's a more efficient way of approaching this spreadsheet function altogether, please let me know!
EDIT 1: As requested by John Coleman, I'll elaborate a bit more on what I'm trying to do here.
Once I have the imported column numbers assigned to a data type, I want to send the data to a second sheet with some code in a manner such as:
For i = 2 to LastRow 'The LastRow variable value will be found using a simple xlDown search process
Worksheets(2).Cells(1,i).Value = Worksheets(1).Cells(DataA,i).Value
Worksheets(2).Cells(4,i).Value = Worksheets(1).Cells(DataB,i).Value
Etc.
Next i
Again, I realize that I could just as easily use
Worksheets(2).Cells(1,i).Value = Worksheets(1).Cells(ColArrayNames(1),i).Value
and so on, but I feel like if what I'm asking about is possible, I might be able to use it in another situation (even if it's not the most ideal method for this example).
For as best as I understand your problem, it seems to come down to effectively (and efficiently) defining a data map between the original source data and the formatted output (in this case, cells on a second worksheet).
When confronted with the data typing part of the problem, I realized the ultimate "holder" of the destination data type is the object that holds it -- in this case its the worksheet cell, not the VBA variable or array. Your focus seems to be on the VBA code that transfers/copies the data from one worksheet to another (or to/from variant arrays). Your own choice in efficiency should be your guide here. If you want to use relatively general purpose code for multiple data types, my suggestion is to use a Variant value or array and after copied to the destination cell, set the format of the destination cell (which effectively "types" it for whatever later use necessary).
A dictionary may not even be necessary if the mapping table can be flexibly scanned to accommodate any number of data types, formats, or columns.

Access array elements from string argument in Modelica

I'm having a task in Modelica, where within a function, I want to read out values of a record (parameters) according to a given string type argument, similar to the dictionary type in Python.
For example I have a record containing coefficicents for different media, I want to read out the coefficients for methane, so my argument is the string "Methane".
Until now I solve this by presenting a second array in my coefficients-record storing the names of the media in strings. This array I parse in a for loop to match the requested media-name and then access the coefficients-array by using the found index.
This is obviously very complicated and leads to a lot of confusing code and nested for loops. Isn't there a more convenient way like the one Python presents with its dictionary type, where a string is directly linked to a value?
Thanks for the help!
There are several different alternatives you can use. I will add the pattern I like most:
model M
function index
input String[:] keys;
input String key;
output Integer i;
algorithm
i := Modelica.Math.BooleanVectors.firstTrueIndex({k == key for k in keys});
end index;
constant String[3] keys = {"A","B","C"};
Real[size(keys,1)] values = {1,2*time,3};
Real c = values[index(keys,"B")] "Coefficient";
annotation(uses(Modelica(version="3.2.1")));
end M;
The reason I like this code is because it can be made efficient by a Modelica compiler. You create a keys vector, and a corresponding data vector. The reason it is not a record is that you want the keys vector to be constant, and the values may vary over time (for a more generic dictionary than you wanted).
The compiler can then create a constant index for any constant names you want to lookup from this. This makes sorting and matching better in the compiler (since there are no unknown indexes). If there is a key you want to lookup at run-time, the code will work for this as well.

How do you extract a subarray from an array in a worksheet function?

Is there some way of getting an array in Excel of a smaller size than a starting array in a cell worksheet function?
So if I had:
{23, "", 34, 46, "", "16"}
I'd end up with:
{23, 34, 46, 16}
which I could then manipulate with some other function.
Conclusion: If I was to do a lot of these I would definitely use jtolle's UDF comb solution. The formula that PPC uses is close, but diving in and testing, I found it gives errors in the empty slots, misses the first value, and there is an easier way to get the row numbers, so here is my final solution:
=IFERROR(INDEX($A$1:$A$6, SMALL(IF(($A$1:$A$6<>""),ROW($A$1:$A$6)),ROW(1:6))),"")
Which must be entered as an array formula (CTRL-SHIFT-ENTER). If being displayed then it must be entered in at least an area as big as the resultset to show all results.
If all you want to do is grab a subset of an array, and you already know the positions of the elements you want, you can just use INDEX with an array for the index argument. That is:
=INDEX({11,22,33,44,55},{2,3,5})
returns {22,33,55}. But that's usually not very useful because you don't know the positions, and I don't know any way to get them without a UDF.
What I have done for this kind of in-worksheet array filtration is to write a UDF with the following form:
'Filters an input sequence based on a second "comb" sequence.
'Non-False-equivalent, non-error values in the comb represent the positions of elements
'to be kept.
Public Function combSeq(seqToComb, seqOfCombValues)
'various library calls to work with 1xn or nx1 arrays or ranges as well as 1-D arrays
'iterate the "comb" and collect positions of keeper elements
'create a new array of the right length and copy in the keeper elements
End Function
I only posted pseudocode because my actual code is all calls to library functions, including the collect-positions and copy-from-positions operations. It would probably obscure the basic idea, which is pretty simple.
You'd call such a UDF like so:
=combSeq({23, "", 34, 46, "", "16"}, {23, "", 34, 46, "", "16"} <> "")
or
=combSeq(Q1:Q42, SIN(Z1:Z42) > 0.5)
and use Excel's normal array mechanics to generate the "comb". It's a lightweight, Excel-friendly way to get a lot of the benefits of the more standard filter(list-to-filter, test-function) function you might see in other programming systems.
I use the name "comb" because "filter" usually means "filter with this function", and with Excel you have to apply the test function before calling the filtration function. Also it can be useful to compute one "comb" as an intermediate result and then use it to...er, comb...multiple lists.
There is an answer on this site: http://www.mrexcel.com/forum/showthread.php?t=112002. Not much explanation though.
Assuming you have data with blank cells on column A and you put this in column B; that will retrieve data in the same order skipping the blanks
=INDEX( $A$1:$A$6,
SMALL(
IF(
($A$2:$A$6<>""),
ROW($A$2:$A$6)
),
ROW()-ROW($B$1)
)
)
Here is the explanation:
ROW()-ROW($B$1) is just a trick that will give you an incrementing number (ie 1 in B1, 2 in B2...)
IF (... , ROW($A$2:$A$6) ) is the main part of the trick: it builds an array of the row numbers where the IF condition is true (note that the IF has no 'else' value)
SMALL(..) will return the Xth smallest value of that array (in our case the number of the Xth nonblank row), where X is the row number of the current cell (1 in B1 ...)
INDEX will then translate from the row number to its value
Note that INDEX and ROW start one row above the actual table to always have an offset > 0 (INDEX does not like zeros)
The above answers all give brittle formulas that cannot be moved to different locations on the sheet and are very sensitive to inserted rows and columns.
Here is a version that is not sensitive and can be moved around to any row:
=INDEX($A$10:$A$40, SMALL(IF(B$10:B$40,ROW(INDIRECT("1:30"))),ROW(INDIRECT("1:30"))))
In this example the original array values are placed in $A$10:$A$40 (perhaps by using the array formula {TRANSPOSE(originalArray)} if the original data was a row instead of a column).
Column B$10:B$40 contains boolean flags (TRUE or FALSE) that determine if this array element should be preserved in the result (TRUE) or not (FALSE). You can populate this column using any function you want. To create the test mentioned in the OP, <>"", B$10 should be filled with: =A10<>"" (and then copied down thru B$40). Column A has absolute column references and column B has relative column references, so the formula can be copied over into columns further to the right, allowing you to create other types of attributes and sub-arrays, which will be governed by boolean tests you put in columns C and D etc.
This example will handle an original array of up to 30 elements. For a larger array, adjust the ranges $A$10:$A$40 and B$10:B$40 (which represent 30 rows) and also adjust the two occurrences of "1:30" to suit.
A possible worksheet function solution:
=INDEX(A1:A6,N(IF(1,MODE.MULT(IF(A1:A6<>"",ROW(1:6)*{1,1})))))
The MODE.MULT function returns a reduced array of indices and N(IF(1,.)) is inserted so that the array is passed by-reference to the INDEX function.

Skip column in an array

I have a VBA function that returns an array to be displayed in Excel. The array's first two columns contain ID's that don't need to be displayed.
Is there any way to modify the Excel formula to skip the first two columns, without going back to create a VBA helper to strip off the columns?
The formula looks like this, where the brackets let the array be displayed across a span of cells:
{=GetCustomers($a$1)}
The closest thing Excel has to built-in array manipulation is the 'INDEX' function. In your case, if the array returned by your 'GetCustomers' routine is a single row, and if you know how long it is (which I guess you do since you're putting it into the sheet), you can get what you want by doing something like this:
=INDEX(GetCustomers($A$1),{3,4,5})
So say GetCustomers() returned the array {1,2,"a","b","c"}, the above would just give back {"a","b","c"}.
There are various ways to save yourself having to type out your array of indices, though. For example,
=COLUMN(C1:E1)
will return {3,4,5}, and you can use that instead:
=INDEX(GetCustomers($A$1),COLUMN(C1:E1))
This trick doesn't work with a true 2-D array, though. 'INDEX' will return a whole row or column if you pass in a zero in the right place, but I don't know how to make it return a 2-D subset. EDIT: You can do it, but it's cumbersome. Say your array is 2x5, and you want the last three columns. You could use:
=INDEX(GetCustomers($A$1), {1,1,1;2,2,2}, {3,4,5;3,4,5})
(FURTHER EDIT: chris neilsen provides a nice way to compute those arrays in his answer.)
Charles Williams has a link on his website that explains more about using arrays like this here:
http://www.decisionmodels.com/optspeedj.htm
He posted that in response to this question I asked:
Is there any documentation of the behavior of built-in Excel functions called with array arguments?
Personally, I use VBA helper functions for things like this. With the right library routines, you can do something like:
=subseq(GetCustomers($A$1),3,3)
in the 1-D case, or:
=hstack(subseq(asCols(GetCustomers($A$1)),3,3))
in the 2-D case, and it's much more clear.
simplest solution is to just hide the first two columns
another may be to use OFFSET to resize the returned array
syntax is
OFFSET(reference,rows,cols,height,width)
I suggest modifying the 'GetCustomers' function to include an optional Boolean variable to tell the function to return all the columns, or just the single column. That would be the cleanest solution, instead of trying to handle it on the formula side.
Public Function GetCustomers(rng as Range, Optional stripColumns as Boolean = False) as Variant()
If stripColumns Then 'Resize array to meet your needs
Else 'return full sized array
End If
End Function
You can use the INDEX function to extract items from the return array
- formula is in an range starting at cell B2
{=INDEX(getcustomers($A$1),ROW()-ROW($B$2)+1,COLUMN()-COLUMN($B$2)+3)}

Resources