Simple Hash Table Excel VBA - arrays

I'm a long user of arrays in VBA but I recently learned a bit about hashing and I was wondering if I could use that to build more efficient searches in my arrays. To keep it specific, what I did was to turn a two dimensional array into a dictionary of rows where the keys is a string (which off course is unique) found in a 'cell' and turned into a double via asc.
I guess the code below explains what I mean:
Private pHook As Object
Sub test()
Set pHook = CreateObject("Scripting.Dictionary")
key = StoAsc("SomeStringOneWantstoFind")
If Not pHook.Exists(key) Then pHook.Add key, "TEST"
d = pHook(key)
End Sub
Public Function StoAsc(stg As String) As Double:
Dim key As String
key = ""
For ii = 1 To Len(stg)
S = Asc(Mid(stg, ii, 1))
key = key & S
Next ii
StoAsc = CDbl(key)
End Function
It looks like it works and it did the job of avoiding a the loop when I just want to find something in the data.
But I can't get out of my mind the idea that there should be a easier and more logical path than building the hashing myself. Am I in a good path? Are there easier ways to 'hash an array' so don't have to loop around every time I need something?

Dictionaries allow strings (or any data type except arrays) to be used as key values (and as item values). So as you suspected, you have no need to do any hashing yourself, all you need to do so store "SomeStringOneWantstoFind" in both the key and the value.
There is an exists method on the dictionary object that lets you find out whether a key value exists which can be used to do this.
Collections can be set up with just a key value, so you could use a collection instead of a dictionary but collections do not have the exists method.
I'm quite new to collections/dictionaries and arrays, so I created a useful crib sheet which I have shared here
I'd welcome your input, as I still feel I don't quote get it, and I'm sure you have moved on since you wrote this question.
Here's my understanding of your question and what you are doing.
In your code you convert "SomeStringOneWantstoFind" to a unique number (using Asc) and store this as hey key and "TEST" as the text. I suspect in reality you would store "SomeStringOneWantstoFind" as the value.
So why are you doing this is the question!
You mention hashing. So you want to look up a text value to see if it is in the dictionary. ie find out whether "MyTextToFind" exists.
So I assume you are converting "MyTextToFind" using Asc in a similar way then using the dictionary exists to see if it is there.
This is all a bit unnecessary - I think.
Be aware that Dictionaries always need a key and a Item (ie a value)

Related

What is the difference between indexing an array and a dictionary?

From my understanding, an array is a simple table of values, such as local t = {"a","b","c"}, and a dictionary is a table of objects, such as local t = {a = 1, b = 2, c = 3} Of course, let me know if I'm wrong in either or both cases.
Anyways, my question lies in how we index the entries in either of these cases. For example, let's say I have the following code:
local t = {"TestEntry"}
print(t["TestEntry"])
Of course, this prints nil. However, when we use a dictionary the same way:
local t = {TestEntry = 1}
print(t["TestEntry"])
This, naturally, prints 1. My question is, why does it work this way for dictionaries, but not arrays?
Finally, I'd like to address the issue that led me to this question. Let's say, before I want to run a chunk of code, I need to see if a specific value is inside a table. It would be convenient if I could just check if it is in the table with table["GivenEntry"], but, as we have seen, this would only work if the entry in the table is actually an object. In my specific case, I am simply using an array, so it is not an object.
Thus, I had to resort to using a for loop to check the table:
local t = {"TestEntry1","TestEntry2"}
for i,v in pairs(t) do
if v == "TestEntry1" then
--do code
end
end
After doing this, it almost seemed as if it would be easier to create a silly dictionary, like:
local t = {TestEntry1 = "TestEntry1"}
because then, I could simply run t["TestEntry1"], and I wouldn't have to worry about having an empty table (because then the for loop would not run). Are there ramifications to creating a dictionary for such purposes? Is it less efficient in general?
Your input is appreciated,
Thank you.
In Lua both arrays an dictionaries are the same type (the table). local t = {"TestEntry"} is essentially short for local t = {[1] = "TestEntry"} (The brackets are needed by Lua for a number, you would access it with t[1]).
So the options for checking if "TestEntry1" is in the table are as you have written. A dictionary takes more memory and depending on how many values you have may take a while to create, but accessing a key should be constant time. Whereas to loop through the table will take longer and longer the more items you have so it is a tradeoff you have to decide on.
There are faster ways to search an array however (e.g. if it is sorted: https://en.wikipedia.org/wiki/Binary_search_algorithm)

Access array elements from string argument in Modelica

I'm having a task in Modelica, where within a function, I want to read out values of a record (parameters) according to a given string type argument, similar to the dictionary type in Python.
For example I have a record containing coefficicents for different media, I want to read out the coefficients for methane, so my argument is the string "Methane".
Until now I solve this by presenting a second array in my coefficients-record storing the names of the media in strings. This array I parse in a for loop to match the requested media-name and then access the coefficients-array by using the found index.
This is obviously very complicated and leads to a lot of confusing code and nested for loops. Isn't there a more convenient way like the one Python presents with its dictionary type, where a string is directly linked to a value?
Thanks for the help!
There are several different alternatives you can use. I will add the pattern I like most:
model M
function index
input String[:] keys;
input String key;
output Integer i;
algorithm
i := Modelica.Math.BooleanVectors.firstTrueIndex({k == key for k in keys});
end index;
constant String[3] keys = {"A","B","C"};
Real[size(keys,1)] values = {1,2*time,3};
Real c = values[index(keys,"B")] "Coefficient";
annotation(uses(Modelica(version="3.2.1")));
end M;
The reason I like this code is because it can be made efficient by a Modelica compiler. You create a keys vector, and a corresponding data vector. The reason it is not a record is that you want the keys vector to be constant, and the values may vary over time (for a more generic dictionary than you wanted).
The compiler can then create a constant index for any constant names you want to lookup from this. This makes sorting and matching better in the compiler (since there are no unknown indexes). If there is a key you want to lookup at run-time, the code will work for this as well.

Hiding vars in strings VS using objects with properties?

So, I've got a word analyzing program in Excel with which I hope to be able to import over 30 million words.
At first,I created a separate object for each of these words so that each word has a...
.value '(string), the actual word itself
.bool1 '(boolean)
.bool2 '(boolean)
.bool3 '(boolean)
.isUsed '(boolean)
.cancel '(boolean)
When I found out I may have 30 million of these objects (all stored in a single collection), I thought that this could be a monster to compile. And so I decided that all my words would be strings, and that I would stick them into an array.
So my array idea is to append each of the 30 million strings by adding 5 spaces (for my 5 bools) at the beginning of each string, with each empty space representing a false bool val. e.g,
If instr(3, arr(n), " ") = 1 then
'my 3rd bool val is false.
Elseif instr(3, arr(n), "*") = 1 then '(I'll insert a '*' to denote true)
'my third bool val is true.
End If
Anyway, what do you guys think? Which way (collection or array) should I go about this (for optimization specifically)?
(I wanted to make this a comment but it became too long)
An answer would depend on how you want to access and process the words, once stored.
There are significant benefits and distinct advantages for 3 candidates:
Arrays are very efficient to populate and retrieve all items at once (ex. range to array and array back to range), but much slower at re-sizing and inserting items in the middle. Each Redim copies the entire memory block to a larger location, and if Preserve is used, all values copied over as well. This may translate to perceived slowness for every operation (in a potential application)
More details (arrays vs collections) here (VB specific but it applies to VBA as well)
Collections are linked lists with hash-tables - quite slow to populate but after that you get instant access to any element in the collection, and just as fast at reordering (sorting) and re-sizing. This can translate into a slow opening file, but all other operations are instant. Other aspects:
Retrieve keys as well as the items associated with those keys
Handle case-sensitive keys
Items can be other collections, arrays, objects
While keys must be unique, they are also optional
An item can be returned in reference to its key, or in reference to its index value
Keys are always strings, and always case insensitive
Items are accessible and retrievable, but its keys are not
Cannot remove all items at once (either one by one, or destroy then recreate the Collection
Enumerating with For...Each...Next, lists all items
More info here and here
Dictionaries: same as collections but with the extra benefit of the .Exists() method which, in some scenarios, makes them much faster than collections. Other aspects:
Keys are mandatory and always unique to that Dictionary
An item can only be returned in reference to its key
The key can take any data type; for string keys, by default a Dictionary is case sensitive
Exists() method to test for the existence of a particular key (and item)
Collections have no similar test; instead, you must attempt to retrieve a value from the Collection, and handle the resulting error if the key is not found
Items AND keys are always accessible and retrievable to the developer
Item property is read/write, so it allows changing the item associated with a particular key
Allows you to remove all items in a single step without destroying the Dictionary itself
Using For...Each...Next dictionaries will enumerate the keys
A Dictionary supports implicit adding of an item using the Item property.
In Collections, items must be added explicitly
More details here
Other links: optimizing loops and optimizing strings (same site)

Excel VBA variable use: Is it better to use multiple variables or one array to store information

I've been using VBA for about a month now, and this forum has been a great resource for my first "programming" language. As I've started to get more comfortable with VBA arrays, I've begun to wonder what the best way to store variables is, and I'm sure someone here knows the answer to what's probably a programming newb question:
Is there any difference, say, between having 10 String variables used independently of each other or an array of String variables used independently of each other (by independent I mean their position in the array doesn't matter for their use in the program). There are bits of code I use where I might have around 9 public variables. Is there any advantage to setting them as an array, despite the fact that I don't need to preserve their order vis a vis one another? e.g. I could have
Public x As String
Public y As String
Public v As String
Public w As String
Or
Public arr(1 to 4) As String
arr(1) = x
arr(2) = y
arr(3) = v
arr(4) = w
In terms of what I need to do with the code, these two versions are functionally equivalent. But is there a reason to use one rather than the other?
Connected to this, I can transpose an array into an Excel field, and use xlUp and xlDown to move around the various values in the array. But I can also move through arrays in similar ways by looking for elements with a particular value or position in an array held "abstractly."* Sometimes I find it easier to manipulate array values once they have been transposed into a worksheet, using xlUp and xlDown. Apart from having to have dedicated worksheet space to do this, is this worse (time, processing power, reliability etc.) than looping through an "abstract"* array (if Applications.ScreenUpdating = False)?
*This may mean something technical to mathematicians/ serious programmers - I'm trying to say an array that doesn't use the visual display of the worksheet grid.
EDIT:
Thank you for your interesting answers. I'm not sure if the second part of my question counts as a second question entirely and I'm therefore breaking a rule of the forum, or if it is connected, but I would be very happy to tick the answer that also considered it
Unless you need to refer to them sequentially or by index# dynamically do not use an array as a grouping of scratch variables. It is harder to read.
Memory-wise they should be near identical with slight more overhead on the array.
As others have noted, there's no need to use arrays for variables which are not related or part of a "set" of values. If however you find yourself doing this:
Dim email1 as String, email2 as String, email3 as String, _
email4 as String, email5 as String
then you should consider whether an array would be a better approach.
To the second part of your question: if you're using arrays in your VBA then it would be preferrable to work with them directly in memory rather than dumping them to a worksheet and navigating them from there.
Keeping everything in-memory is going to be faster, and removes dependencies such as having to ensure there's a "scratch" worksheet around: such dependencies make your code less re-usable and more brittle.

In Lua, how should I handle a zero-based array index which comes from C?

Within C code, I have an array and a zero-based index used to lookup within it, for example:
char * names[] = {"Apple", "Banana", "Carrot"};
char * name = names[index];
From an embedded Lua script, I have access to index via a getIndex() function and would like to replicate the array lookup. Is there an agreed on "best" method for doing this, given Lua's one-based arrays?
For example, I could create a Lua array with the same contents as my C array, but this would require adding 1 when indexing:
names = {"Apple", "Banana", "Carrot"}
name = names[getIndex() + 1]
Or, I could avoid the need to add 1 by using a more complex table, but this would break things like #names:
names = {[0] = "Apple", "Banana", "Carrot"}
name = names[getIndex()]
What approach is recommended?
Edit: Thank you for the answers so far. Unfortunately the solution of adding 1 to the index within the getIndex function is not always applicable. This is because in some cases indices are "well-known" - that is, it may be documented that an index of 0 means "Apple" and so on. In that situation, should one or the other of the above solutions be preferred, or is there a better alternative?
Edit 2: Thanks again for the answers and comments, they have really helped me think about this issue. I have realized that there may be two different scenarios in which the problem occurs, and the ideal solution may be different for each.
In the first case consider, for example, an array which may differ from time to time and an index which is simply relative to the current array. Indices have no meaning outside the code. Doug Currie and RBerteig are absolutely correct: the array should be 1-based and getIndex should contain a +1. As was mentioned, this allows the code on both the C and Lua sides to be idiomatic.
The second case involves indices which have meaning, and probably an array which is always the same. An extreme example would be where names contains "Zero", "One", "Two". In this case, the expected value for each index is well-known, and I feel that making the index on the Lua side one-based is unintuitive. I believe one of the other approaches should be preferred.
Use 1-based Lua tables, and bury the + 1 inside the getIndex function.
I prefer
names = {[0] = "Apple", "Banana", "Carrot"}
name = names[getIndex()]
Some of table-manipulation features - #, insert, remove, sort - are broken.
Others - concat(t, sep, 0), unpack(t, 0) - require explicit starting index to run correctly:
print(table.concat(names, ',', 0)) --> Apple,Banana,Carrot
print(unpack(names, 0)) --> Apple Banana Carrot
I hate constantly remembering of that +1 to cater Lua's default 1-based indices style.
You code should reflect your domain specific indices to be more readable.
If 0-based indices are fit well for your task, you should use 0-based indices in Lua.
I like how array indices are implemented in Pascal: you are absolutely free to choose any range you want, e.g., array[-10..-5]of byte is absolutely OK for an array of 6 elements.
This is where Lua metemethods and metatables come in handy. Using a table proxy and a couple metamethods, you can modify access to the table in a way that would fit your need.
local names = {"Apple", "Banana", "Carrot"} -- Original Table
local _names = names -- Keep private access to the table
local names = {} -- Proxy table, used to capture all accesses to the original table
local mt = {
__index = function (t,k)
return _names[k+1] -- Access the original table
end,
__newindex = function (t,k,v)
_names[k+1] = v -- Update original table
end
}
setmetatable(names, mt)
So what's going on here, is that the original table has a proxy for itself, then the proxy catches every access attempt at the table. When the table is accessed, it increment the value it was accessed by, simulating a 0-based array. Here are the print result:
print(names[0]) --> Apple
print(names[1]) --> Banana
print(names[2]) --> Carrot
print(names[3]) --> nil
names[3] = "Orange" --Add a new field to the table
print(names[3]) --> Orange
All table operations act just as they would normally. With this method you don't have to worry about messing with any unordinary access to the table.
EDIT: I'd like to point out that the new "names" table is merely a proxy to access the original names table. So if you queried for #names the result would be nil because that table itself has no values. You'd need to query for #_names to access the size of the original table.
EDIT 2: As Charles Stewart pointed out in the comment below, you can add a __len metamethod to the mt table to ensure the #names call gives you the correct results.
First of all, this situation is not unique to applications that mix Lua and C; you can face the same question even when using Lua only apps. To provide an example, I'm using an editor component that indexes lines starting from 0 (yes, it's C-based, but I only use its Lua interface), but the lines in the script that I edit in the editor are 1-based. So, if the user sets a breakpoint on line 3 (starting from 0 in the editor), I need to send a command to the debugger to set it on line 4 in the script (and convert back when the breakpoint is hit).
Now the suggestions.
(1) I personally dislike using [0] hack for arrays as it breaks too many things. You and Egor already listed many of them; most importantly for me it breaks # and ipairs.
(2) When using 1-based arrays I try to avoid indexing them and to use iterators as much as possible: for i, v in ipairs(...) do instead of for i = 1, #array do).
(3) I also try to isolate my code that deals with these conversions; for example, if you are converting between lines in the editor to manage markers and lines in the script, then have marker2script and script2marker functions that do the conversion (even if it's simple +1 and -1 operations). You'd have something like this anyway even without +1/-1 adjustments, it would just be implicit.
(4) If you can't hide the conversion (and I agree, +1 may look ugly), then make it even more noticeable: use c2l and l2c calls that do the conversion. In my opinion it's not as ugly as +1/-1, but has the advantage of communicating the intent and also gives you an easy way to search for all the places where the conversion happens. It's very useful when you are looking for off-one bugs or when API changes cause updates to this logic.
Overall, I wouldn't worry about these aspects too much. I'm working on a fairly complex Lua app that wraps several 0-based C components and don't remember any issues caused by different indexing...
Why not just turn the C-array into a 1-based array as well?
char * names[] = {NULL, "Apple", "Banana", "Carrot"};
char * name = names[index];
Frankly, this will lead to some unintuitive code on the C-side, but if you insist that there must be 'well-known' indices that work in both sides, this seems to be the best option.
A cleaner solution is of course not to make those 'well-known' indices part of the interface. For example, you could use named identifiers instead of plain numbers. Enums are a nice match for this on the C side, while in Lua you could even use strings as table keys.
Another possibility is to encapsulate the table behind an interface so that the user never accesses the array directly but only via a C-function call, which can then perform arbitrarily complex index transformations. Then you only need to expose that C function in Lua and you have a clean and maintainable solution.
Why not present your C array to Lua as userdata? The technique is described with code in PiL, section 'Userdata'; you can set the __index, __newindex, and __len metatable methods, and you can inherit from a class to provide other sequence manipulation functions as regular methods (e.g., define an array with array.remove, array.sort, array.pairs functions, which can be defined as object methods by a further tweak to __index). Doing things this way means you have no "synchronisation" issues between Lua and C, and it avoids risks that "array" tables get treated as ordinary tables resulting in off-by-one errors.
You can fix this lua-flaw by using an iterator that is aware of different index bases:
function iarray(a)
local n = 0
local s = #a
if a[0] ~= nil then
n = -1
end
return function()
n = n + 1
if n <= s then return n,a[n] end
end
end
However, you still have to add the zeroth element manually:
Usage example:
myArray = {1,2,3,4,5}
myArray[0] = 0
for _,e in iarray(myArray) do
-- do something with element e
end

Resources