In Lua, how should I handle a zero-based array index which comes from C? - c

Within C code, I have an array and a zero-based index used to lookup within it, for example:
char * names[] = {"Apple", "Banana", "Carrot"};
char * name = names[index];
From an embedded Lua script, I have access to index via a getIndex() function and would like to replicate the array lookup. Is there an agreed on "best" method for doing this, given Lua's one-based arrays?
For example, I could create a Lua array with the same contents as my C array, but this would require adding 1 when indexing:
names = {"Apple", "Banana", "Carrot"}
name = names[getIndex() + 1]
Or, I could avoid the need to add 1 by using a more complex table, but this would break things like #names:
names = {[0] = "Apple", "Banana", "Carrot"}
name = names[getIndex()]
What approach is recommended?
Edit: Thank you for the answers so far. Unfortunately the solution of adding 1 to the index within the getIndex function is not always applicable. This is because in some cases indices are "well-known" - that is, it may be documented that an index of 0 means "Apple" and so on. In that situation, should one or the other of the above solutions be preferred, or is there a better alternative?
Edit 2: Thanks again for the answers and comments, they have really helped me think about this issue. I have realized that there may be two different scenarios in which the problem occurs, and the ideal solution may be different for each.
In the first case consider, for example, an array which may differ from time to time and an index which is simply relative to the current array. Indices have no meaning outside the code. Doug Currie and RBerteig are absolutely correct: the array should be 1-based and getIndex should contain a +1. As was mentioned, this allows the code on both the C and Lua sides to be idiomatic.
The second case involves indices which have meaning, and probably an array which is always the same. An extreme example would be where names contains "Zero", "One", "Two". In this case, the expected value for each index is well-known, and I feel that making the index on the Lua side one-based is unintuitive. I believe one of the other approaches should be preferred.

Use 1-based Lua tables, and bury the + 1 inside the getIndex function.

I prefer
names = {[0] = "Apple", "Banana", "Carrot"}
name = names[getIndex()]
Some of table-manipulation features - #, insert, remove, sort - are broken.
Others - concat(t, sep, 0), unpack(t, 0) - require explicit starting index to run correctly:
print(table.concat(names, ',', 0)) --> Apple,Banana,Carrot
print(unpack(names, 0)) --> Apple Banana Carrot
I hate constantly remembering of that +1 to cater Lua's default 1-based indices style.
You code should reflect your domain specific indices to be more readable.
If 0-based indices are fit well for your task, you should use 0-based indices in Lua.
I like how array indices are implemented in Pascal: you are absolutely free to choose any range you want, e.g., array[-10..-5]of byte is absolutely OK for an array of 6 elements.

This is where Lua metemethods and metatables come in handy. Using a table proxy and a couple metamethods, you can modify access to the table in a way that would fit your need.
local names = {"Apple", "Banana", "Carrot"} -- Original Table
local _names = names -- Keep private access to the table
local names = {} -- Proxy table, used to capture all accesses to the original table
local mt = {
__index = function (t,k)
return _names[k+1] -- Access the original table
end,
__newindex = function (t,k,v)
_names[k+1] = v -- Update original table
end
}
setmetatable(names, mt)
So what's going on here, is that the original table has a proxy for itself, then the proxy catches every access attempt at the table. When the table is accessed, it increment the value it was accessed by, simulating a 0-based array. Here are the print result:
print(names[0]) --> Apple
print(names[1]) --> Banana
print(names[2]) --> Carrot
print(names[3]) --> nil
names[3] = "Orange" --Add a new field to the table
print(names[3]) --> Orange
All table operations act just as they would normally. With this method you don't have to worry about messing with any unordinary access to the table.
EDIT: I'd like to point out that the new "names" table is merely a proxy to access the original names table. So if you queried for #names the result would be nil because that table itself has no values. You'd need to query for #_names to access the size of the original table.
EDIT 2: As Charles Stewart pointed out in the comment below, you can add a __len metamethod to the mt table to ensure the #names call gives you the correct results.

First of all, this situation is not unique to applications that mix Lua and C; you can face the same question even when using Lua only apps. To provide an example, I'm using an editor component that indexes lines starting from 0 (yes, it's C-based, but I only use its Lua interface), but the lines in the script that I edit in the editor are 1-based. So, if the user sets a breakpoint on line 3 (starting from 0 in the editor), I need to send a command to the debugger to set it on line 4 in the script (and convert back when the breakpoint is hit).
Now the suggestions.
(1) I personally dislike using [0] hack for arrays as it breaks too many things. You and Egor already listed many of them; most importantly for me it breaks # and ipairs.
(2) When using 1-based arrays I try to avoid indexing them and to use iterators as much as possible: for i, v in ipairs(...) do instead of for i = 1, #array do).
(3) I also try to isolate my code that deals with these conversions; for example, if you are converting between lines in the editor to manage markers and lines in the script, then have marker2script and script2marker functions that do the conversion (even if it's simple +1 and -1 operations). You'd have something like this anyway even without +1/-1 adjustments, it would just be implicit.
(4) If you can't hide the conversion (and I agree, +1 may look ugly), then make it even more noticeable: use c2l and l2c calls that do the conversion. In my opinion it's not as ugly as +1/-1, but has the advantage of communicating the intent and also gives you an easy way to search for all the places where the conversion happens. It's very useful when you are looking for off-one bugs or when API changes cause updates to this logic.
Overall, I wouldn't worry about these aspects too much. I'm working on a fairly complex Lua app that wraps several 0-based C components and don't remember any issues caused by different indexing...

Why not just turn the C-array into a 1-based array as well?
char * names[] = {NULL, "Apple", "Banana", "Carrot"};
char * name = names[index];
Frankly, this will lead to some unintuitive code on the C-side, but if you insist that there must be 'well-known' indices that work in both sides, this seems to be the best option.
A cleaner solution is of course not to make those 'well-known' indices part of the interface. For example, you could use named identifiers instead of plain numbers. Enums are a nice match for this on the C side, while in Lua you could even use strings as table keys.
Another possibility is to encapsulate the table behind an interface so that the user never accesses the array directly but only via a C-function call, which can then perform arbitrarily complex index transformations. Then you only need to expose that C function in Lua and you have a clean and maintainable solution.

Why not present your C array to Lua as userdata? The technique is described with code in PiL, section 'Userdata'; you can set the __index, __newindex, and __len metatable methods, and you can inherit from a class to provide other sequence manipulation functions as regular methods (e.g., define an array with array.remove, array.sort, array.pairs functions, which can be defined as object methods by a further tweak to __index). Doing things this way means you have no "synchronisation" issues between Lua and C, and it avoids risks that "array" tables get treated as ordinary tables resulting in off-by-one errors.

You can fix this lua-flaw by using an iterator that is aware of different index bases:
function iarray(a)
local n = 0
local s = #a
if a[0] ~= nil then
n = -1
end
return function()
n = n + 1
if n <= s then return n,a[n] end
end
end
However, you still have to add the zeroth element manually:
Usage example:
myArray = {1,2,3,4,5}
myArray[0] = 0
for _,e in iarray(myArray) do
-- do something with element e
end

Related

What is the difference between indexing an array and a dictionary?

From my understanding, an array is a simple table of values, such as local t = {"a","b","c"}, and a dictionary is a table of objects, such as local t = {a = 1, b = 2, c = 3} Of course, let me know if I'm wrong in either or both cases.
Anyways, my question lies in how we index the entries in either of these cases. For example, let's say I have the following code:
local t = {"TestEntry"}
print(t["TestEntry"])
Of course, this prints nil. However, when we use a dictionary the same way:
local t = {TestEntry = 1}
print(t["TestEntry"])
This, naturally, prints 1. My question is, why does it work this way for dictionaries, but not arrays?
Finally, I'd like to address the issue that led me to this question. Let's say, before I want to run a chunk of code, I need to see if a specific value is inside a table. It would be convenient if I could just check if it is in the table with table["GivenEntry"], but, as we have seen, this would only work if the entry in the table is actually an object. In my specific case, I am simply using an array, so it is not an object.
Thus, I had to resort to using a for loop to check the table:
local t = {"TestEntry1","TestEntry2"}
for i,v in pairs(t) do
if v == "TestEntry1" then
--do code
end
end
After doing this, it almost seemed as if it would be easier to create a silly dictionary, like:
local t = {TestEntry1 = "TestEntry1"}
because then, I could simply run t["TestEntry1"], and I wouldn't have to worry about having an empty table (because then the for loop would not run). Are there ramifications to creating a dictionary for such purposes? Is it less efficient in general?
Your input is appreciated,
Thank you.
In Lua both arrays an dictionaries are the same type (the table). local t = {"TestEntry"} is essentially short for local t = {[1] = "TestEntry"} (The brackets are needed by Lua for a number, you would access it with t[1]).
So the options for checking if "TestEntry1" is in the table are as you have written. A dictionary takes more memory and depending on how many values you have may take a while to create, but accessing a key should be constant time. Whereas to loop through the table will take longer and longer the more items you have so it is a tradeoff you have to decide on.
There are faster ways to search an array however (e.g. if it is sorted: https://en.wikipedia.org/wiki/Binary_search_algorithm)

In Matlab, can I access an element of an array, which is in turn a value of a container.Map?

Here is a code snippet, that shows what I want and the error, that follows:
a = [1, 2];
m = containers.Map('KeyType','char', 'ValueType','any');
m('stackoverflow.com') = a;
pull_the_first_element_of_the_stored_array = m('stackoverflow.com')(1);
??? Error: ()-indexing must appear last in an index expression.
How do I access an element of the array, which is in turn a value of a map object?
I could have done this:
temp = m('stackoverflow.com');
pull_the_first_element_of_the_stored_array = temp(1);
But I do not want to create an intermediate array only to pull a single value out of it.
EDIT : This is a duplicate of How can I index a MATLAB array returned by a function without first assigning it to a local variable? The answer is there.
This is another case where you can get around syntax limitations with small helper functions. EG:
getFirst = #(x)x(1);
pull_the_first_element_of_the_stored_array = getFirst(m('stackoverflow.com'));
This still needs two lines, but you can often reuse the function definition. More generally, you could write:
getNth = #(x, n) x(n);
And then use:
getNth (m('stackoverflow.com'),1);
Although this question is a duplicate of this previous question, I feel compelled to point out one small difference between the problems they are addressing, and how my previous answer could be adapted slightly...
The previous question dealt with how to get around the syntax issue involved in having a function call immediately followed by an indexing operation on the same line. This question instead deals with two indexing operations immediately following one another on the same line. The two solutions from my other answer (using SUBSREF or a helper function) also apply, but there is actually an alternative way to use SUBSREF that combines the two indexing operations, like so:
value = subsref(m,struct('type','()','subs',{'stackoverflow.com',{1}}));
Note how the sequential index subscripts 'stackoverflow.com' and 1 are combined into a cell array to create a 1-by-2 structure array to pass to SUBSREF. It's still an ugly one-liner, and I would still advocate using the temporary variable solution for the sake of readability.

What's the purpose of arrays starting with nonzero index?

I tried to find answers, but all I got was answers on how to realize arrays starting with nonzero indexes. Some languages, such as pascal, provide this by default, e.g., you can create an array such as
var foobar: array[1..10] of string;
I've always been wondering: Why would you want to have the array index not to start with 0?
I guess it may be more familiar for beginners to have arrays starting with 1 and the last index being the size of the array, but on a long-term basis, programmers should get used to values starting with 0.
Another purpose I could think of: In some cases, the index could actually represent something thats contained in the respective array-entry. e.g., you want to get all capital letters in an array, it may be handy to have an index being the ASCII-Code of the respective letter. But its pretty easy just to subtract a constant value. In this example, you could (in C) simply do something like this do get all capital letters and access the letter with ascii-code 67:
#define ASCII_SHIFT 65
main()
{
int capital_letters[26];
int i;
for (i=0; i<26; i++){
capital_letters[i] = i+ASCII_SHIFT;
}
printf("%c\n", capital_letters[67-ASCII_SHIFT]);
}
Also, I think you should use hash tables if you want to access entries by some sort of key.
Someone might retort: Why should the index always start with 0? Well, it's a hell of a lot simpler this way. You'll be faster when you just have to type one index when declaring an array. Also, you can always be sure that the first entry is array[0] and the last one is array[length_of_array-1]. It is also common that other data structures start with 0. e.g., if you read a binary file, you start with the 0th byte, not the first.
Now, why do some programming languages have this "feature" and why do some people ask how to achieve this in languages such as C/C++?, is there any situation where an array starting with a nonzero index is way more useful, or even, something simply cannot be done with an array starting at 0?
If your index means something, e.g. an id from a database or some such, then it's useful.
Oh, and you can't use hashes because you want to use it with some other piece of code that expects arrays.
For example, Rails checkboxes. They're passed from the web form as arrays but in my code I want to access the udnerlying database object. The array index is the id, et voila!
Non-zero based arrays are a natural extension of arrays with ordinal indexes that are not integers. In Pascal you can have arrays like:
var
letter_count : array['a'..'z'] of integer;
Or:
type
flags = (GREEN, YELOW, RED);
var
flags_seen = array[flags] of boolean;
A classic is an array with negative indexes:
zero_centered_grid = array[-N..N,-N..N] of sometype;
The idea is that:
Many indexing errors can be detected at compile time if the declaration of indexes is more specific.
Some algorithms (heaps come to mind) have cleaner implementations when the minimum index is something different from zero.
Languages with only zero-based arrays use well defined idioms for the latter, and have efficient implementations of dictionaries/maps for the rest.

Why do Lua arrays(tables) start at 1 instead of 0?

I don't understand the rationale behind the decision of this part of Lua. Why does indexing start at 1? I have read (as many others did) this great paper. It seems to me a strange corner of a language that is very pleasant to learn and program. Don't get me wrong, Lua is just great but there has to be an explanation somewhere. Most of what I found (on the web) is just saying the index starts at 1. Full stop.
It would be very interesting to read what its designers said about the subject.
Note that I am "very" beginner in Lua, I hope I am not missing something obvious about tables.
Lua is descended from Sol, a language designed for petroleum engineers with no formal training in computer programming. People not trained in computing think it is damned weird to start counting at zero. By adopting 1-based array and string indexing, the Lua designers avoided confounding the expectations of their first clients and sponsors.
Although I too found them weird at the beginning, I have learned to love 0-based arrays. But I get by OK with Lua's 1-based arrays, especially by
using Lua's generic for loop and the ipairs operator—I can usually avoid worrying about just how arrays are indexed.
In Programming in Lua's first discussion of tables, they mention:
Since you can index a table with any value, you can start the indices of an array with any number that pleases you. However, it is customary in Lua to start arrays with 1 (and not with 0, as in C) and several facilities stick to this convention.
Later on, in the chapter on data structures, they say almost the same thing again: that Lua's built-in facilities assume 1-based indexing.
Anyway, there are a couple conveniences to using 1-based indexing. Namely, the # (length) operator: t[#t] access the last (numeric) index of the table, and t[#t+1] accesses 1 past the last index. To someone who hasn't already been exposed to 0-based indexing, #t+1 would be more intuitive to move past the end of a list. There's also Lua's for i = 1,#t construct, which I believe falls under the same category as the previous point that "1 to the length" can be more sensible than indexing "0 to the length minus 1".
But, if you can't break the mindset of 0-based indexing, then Lua's 1-based indexing can certainly be more of a hindrance. Ultimately, the authors wanted something that worked for them; and I'll admit I don't know what their original goal was, but it's probably changed since then.
My understanding is that it's that way just because the authors thought it would be a good way to do it, and after they rolled the language out to the public that decision calcified considerably. (I suspect there would be hell to pay were they to change it today!) I've never seen a particular justification beyond that.
Perhaps a less significant point, but one I haven't heard mentioned yet: there is better symmetry in the fact that the first and last characters in a string are at 1 and -1 respectively, instead of 0 and -1.
Lua libraries prefer to use indices which start at 1. However, you can use any index you want. You can use 0, you can use 1, you can use -5. It is even in their manual, which can be found at (https://www.lua.org/pil/11.1.html).
In fact, something cool here is internal lua libraries will treat SOME passed 0's as 1's. Just be cautious when using ipairs.
So that: ("abc"):sub(0,1) == "a" and ("abc"):sub(1,1) == "a" will be true.
You can start an array at index 0, 1, or any other value:
-- creates an array with indices from -5 to 5
a = {}
for i=-5, 5 do
a[i] = 0
end
The specific definitions of array index in C and Lua, are different.
In C array, it means: item address offset of the array address.
In Lua array, it means: the n-th item in array.
Why most languages use 0-based index? Because the compiler code with offset definition is more convenient and effective. They mostly handle addresses.
And the Lua. This is the code of lua 5.3.5 for table index with C:
const TValue *luaH_getint (Table *t, lua_Integer key) {
if (l_castS2U(key) - 1 < t->sizearray)
return &t->array[key - 1];
else {
Node *n = hashint(t, key);
for (;;) {
if (ttisinteger(gkey(n)) && ivalue(gkey(n)) == key)
return gval(n);
else {
int nx = gnext(n);
if (nx == 0) break;
n += nx;
}
}
return luaO_nilobject;
}
}
We should focus on the code &t->array[key - 1], it have a subtraction operation. It is not effective compared with 0-based index.
But, the 1-based index is more neared with human being languages. We focus more on n-th item in English, Chinese, Japanese and also.
So, I guess the Lua designers choose 1-based index, they choose easy understanding for pure newer of program, give up the convenience and effectiveness.
In your example, table[0] will always return nil(null), unless you assign value to it yourself, like table[0] = 'some value' and then table[0] will return 'some value', which you assigned.
Here's an example:
tbl = {"some"}
print("tbl[0]=" .. tostring(tbl[0]))
print("tbl[1]=" .. tostring(tbl[1]))
nothing = {}
print("nothing[0]=" .. tostring(nothing[0]))
print("nothing[1]=" .. tostring(nothing[1]))
nothing[0] = "hey"
print("(after assign)\nnothing[0]=" .. tostring(nothing[0]))
The real reason is that the language is an implementation of the definition in a law of Portugal and the major development centre was in Brazil and their preference is to avoid the use of zero or empty or nothing as an index or subscript. However the language does permit the use of a start index other than 1 in a table creating function in some versions.
It makes sense to everyone, that if
table = {}
table is empty. So, when
table == {something}
The table contains something so what it contains is index 1 in table, if you know what I mean.
What I meant is that table[0] exists, and its table = {}, which is empty, now a programmer won't call a empty table, it sets them, and then fills it, it will be useless to find an empty table every time you want to call it, so it's simpler to just create an empty table.

How can I master the idea of arrays?

I totally understand the purpose of arrays, yet I do not feel I have "mastered" them. Does anyone have some really good problems or readings involving arrays. I program in PHP and C++ so if there are examples with those languages that would be preferable but is not necessary.
Draw everything out on graph paper.
Memory is just little boxes, it's a lot clearer to see on paper with a few arrows than in some complex markup language
Define 'mastered'.
Does anyone have some really good problems or readings involving arrays.
Try array based
Linked-list implementation.
Implement stacks, queues, etc and then use stack(s) to emulate a queue etc
Heaps
A lot of people seem to struggle with the concept of arrays at first, particularly arrays of >2 dimensions.
It's a little too abstract. However, getting past that initial block just requires a concrete exploration of the mechanics. So, here's an example I've used a lot that shows the basic mechanics in a nerd-friendly way:
Concrete Example (in psuedocode)
Let's say you're building a role playing game and you want to keep track of your character's stats. You could use an array of integers like this:
Stats(0) could be strength
Stats(1) could be dexterity
Stats(2) could be intelligence
...And so on.
Now let's add a level of complexity. Maybe we want to introduce a potion of strength that increases strength by 5 for 10 turns. We could represent the stat side of that by making this into a 2-dimensional array:
Stats(0, 0) - this is my current strength.
Stats(0, 1) - this is my normal strength.
Stats(0, 2) - this is the number of turns until the strength potion wears off.
Stats(1, 0) - this is my current dexterity.
...And so on. You get the idea.
So I've added a 2nd dimension to hold details about the 1st dimension. What if I wanted our Stats array to handle statistics for more than one character? I could represent that by making this into a 3-dimensional array:
Stats(0, 0, 0) - character 0's current strength.
Stats(1, 1, 0) - character 1's current dexterity.
It would be even better to create some constants or enums to eliminate magic numbers from the code:
Const Strength = 0
Const Dexterity = 1
Const Intelligence = 2
Const CurrentValue = 0
Const NormalValue = 1
Const PotionTurns = 2
Then I could do:
Stats(1, Dexterity, NormalValue) = 5 'For character 1, set the normalvalue of dex to 5.
A few more thoughts about arrays... At least in the .Net world where I live, most of us don't have to use them in our day-to-day lives too often because they're slowly being relegated to underpinnings for more elaborate data structures like collections.
In fact, if I were to implement character stats, realistically I would not use arrays.
However, it's still important to get your head around them because they are rocket-fast and there are definitely cases where they're invaluable.
Try manually (no built in methods) sorting with arrays (bubblesort is a good one to get yourself going)
An array is a contiguous block of memory devoted to N items of the same type, where N is a fixed number indicating the number of items. In order to expand an array (as they are fixed in size), you can use the C function realloc(..). In C++, you can manually re-allocate an array by creating a new, larger array and copying the contents of the old array into the new one (an expensive operation).
Modern languages have options that supplant arrays (so as to overcome the inherent limitations of arrays). In C++, you can use the Standard Template Library (STL) for this purpose; the STL "vector" data type is a typical replacement for standard arrays in C++. Platform-dependent APIs like MFC have built-in constructs like the CArrayList for a richer array usage experience. ;-)
I'm going to try to explain the best i can arrays in their fundamental form.
Ok, so an array is basically a way to store data. For instance if you want a list of shopping items, we would use a one dimensional array:
[0] - "Bread"
[1] - "Milk"
[2] - "Eggs"
[3] - "Butter"
.
.
.
[n] - "Candy"
Each index 0,1,2,3,...n holds a specific data. The data as you can see are represented as Strings. Now i can't use something like :
[n+1] = 1000
because this will put an integer as index n+1, the compiler will tell you that it's no good and you need to fix that issue.
Moving on to matrices or 2-dimensional arrays. Take a piece of squared paper like the ones you use for math and draw a Cartesian system and some dots. Put the coordinates on different piece of paper and next to them put a 1. For example:
[0,0] = 1
[0,1] = 1
[2,3] = 1
What this means is that at index [0,0],[0,1],[2,3] i have 1. A representation would be like so:
Cartesian System in form of a matrices :
1) 2) 3)
1) 1 1 0
2) 0 0 1
3) 0 0 0
I used just simple arrays to illustrate what are they, for example if you want to go 3D the data structure for that would be a array of matrices aka a list which holds each Cartesian location at a specific height.
If we had 10 as height and the same dots as above it would be something like:
[10,0,0] - 1
[10,0,1] - 1
[10,2,3] - 1
If you want a way to master: grab a simple list of problems and try to implement them using arrays of any kind. There is no quick way to do it, just have patience and practice.
Alright, so a one-dimensional array is just a grouping of variables. Useful if you've got a lot of something, and you like to save time and space. Functionally,
element1:=5;
element2:=6;
element3:=7;
...
is the same as saying
element[1]:=5;
element[2]:=6;
element[3]:=7;
...
except now the computer knows what you're talking about and you can write something like:
for i:=1 to n do
element[i]:=element[i]+1;
Moving on, a two dimensional array is somewhat more complicated, but can be thought of as an array of arrays. So we could have something like this:
type
arrayA=array[1..50] of integer;
arrayB=array[1..50] of arrayA;
arrayB=array[1..50,1..50] of integer; //an equivalent declaration in Pascal to the above two
More specifically, a two-dimensional array is a table. So for example, if arrayA contains the grades of a student in 50 classes, then arrayB represents the grades of a bunch of students. So, arrayB[3,5] would be the grade of the third student on class number 3.
It's easy to expand the same logic to add another dimension to the array:
arrayC=array[1..50] of arrayB;
We could say that C represents a school, so arrayC[2,4,6] is the grade of the second schools fourth student in class 6.
Now, what are arrays used for? Storing groups of similar information, or information which'll need to be processed in bulk. In practice, though, you'll mostly be using a one-, at most two-, dimensional array. If you can imagine your data as a table, you'll almost definitely want an two-dimensional array, for example. How else would you represent a chess board, if you had to?
Three-dimensional arrays tend to be used less, but what if you had to save some sort of state for each of your table elements? Something like:
Table=array[1..50, 1..50, 0..1] of integer;
Then the third value is 1 if some condition is true for a given element, and 0 otherwise. Of course, a trivial example, but it's another way of understand three-dimensional arrays more easily.

Resources