Why tuples are not enumerable in Elixir? - arrays

I need an efficient structure for array of thousands of elements of the same type with ability to do random access.
While list is most efficient on iteration and prepending, it is too slow on random access, so it does not fit my needs.
Map works better. Howerver it causes some overheads because it is intended for key-value pairs where key may be anything, while I need an array with indexes from 0 to N. As a result my app worked too slow with maps. I think this is not acceptable overhead for such a simple task like handling ordered lists with random access.
I've found that tuple is most efficient structure in Elixir for my task. When comparing to map on my machine it is faster
on iteration - 1.02x for 1_000, 1.13x for 1_000_000 elements
on random access - 1.68x for 1_000, 2.48x for 1_000_000
and on copying - 2.82x for 1_000, 6.37x for 1_000_000.
As a result, my code on tuples is 5x faster than the same code on maps. It probably does not need explanation why tuple is more efficient than map. The goal is achieved, but everybody tells "don't use tuples for a list of similar elements", and nobody can explain this rule (example of such cases https://stackoverflow.com/a/31193180/5796559).
Btw, there are tuples in Python. They are also immutable, but still iterable.
So,
1. Why tuples are not enumerable in Elixir? Is there any technical or logical limitation?
2. And why should not I use them as lists of similar elements? Is there any downsides?
Please note: the questions is "why", not "how". The explanation above is just an example where tuples works better than lists and maps.

1. The reason not to implement Enumerable for Tuple
From the retired Elixir talk mailing list:
If there is a
protocol implementation for tuple it would conflict with all records.
Given that custom instances for a protocol virtually always are
defined for records adding a tuple would make the whole Enumerable
protocol rather useless.
-- Peter Minten
I wanted tuples to be enumerable at first, and even
eventually implemented Enumerable on them, which did not work out.
-- Chris Keele
How does this break the protocol? I'll try to put things together and explain the problem from the technical point of view.
Tuples. What's interesting about tuples is that they are mostly used for a kind of duck typing using pattern matching. You are not required to create new module for new struct every time you want some new simple type. Instead of this you create a tuple - a kind of object of virtual type. Atoms are often used as first elements as type names, for example {:ok, result} and {:error, description}. This is how tuples are used almost anywhere in Elixir, because this is their purpose by design. They are also used as a basis for "records" that comes from Erlang. Elixir has structs for this purpose, but it also provides module Record for compatibility with Erlang. So in most cases tuples represent single structures of heterogenous data which are not meant to be enumerated. Tuples should be considered like instances of various virtual types. There is even #type directive that allows to define custom types based on tuples. But remember they are virtual, and is_tuple/1 still returns true for all those tuples.
Protocols. On the other hand, protocols in Elixir is a kind of type classes which provide ad hoc polymorphism. For those who come from OOP this is something similar to superclasses and multiple inheritance. One important thing that protocol is doing for you is automatic type checking. When you pass some data to a protocol function, it checks that the data belongs to this class, i.e. that protocol is implemented for this data type. If not then you'll get error like this:
** (Protocol.UndefinedError) protocol Enumerable not implemented for {}
This way Elixir saves your code from stupid mistakes and complex errors unless you make wrong architectural decisions
Altogether. Now imagine we implement Enumerable for Tuple. What it does is making all tuples enumerable while 99.9% of tuples in Elixir are not intended to be so. All the checks are broken. The tragedy is the same as if all animals in the world begin quacking. If a tuple is passed to Enum or Stream module accidentally then you will not see useful error message. Instead of this your code will produce unexpected results, unpredictable behaviour and possibly data corruption.
2. The reason not to use tuples as collections
Good robust Elixir code should contain typespecs that help developers to understand the code, and give Dialyzer ability to check the code for you. Imagine you want a collection of similar elements. The typespec for lists and maps may look like this:
#type list_of_type :: [type]
#type map_of_type :: %{optional(key_type) => value_type}
But you can't write same typespec for tuple, because {type} means "a tuple of single element of type type". You can write typespec for a tuple of predefined length like {type, type, type} or for a tuple of any elements like tuple(), but there is no way to write a typespec for a tuple of similar elements just by design. So choosing tuples to store your collection of elemenets means you lose such a nice ability to make your code robust.
Conclusion
The rule not to use tuples as lists of similar elements is a rule of thumb that explains how to choose right type in Elixir in most cases. Violation of this rule may be considered as possible signal of bad design choice. When people say "tuples are not intended for collections by design" this means not just "you do something unusual", but "you can break the Elixir features by doing wrong design in your application".
If you really want to use tuple as a collection for some reason and you are sure you know what you do, then it is a good idea to wrap it into some struct. You can implement Enumerable protocol for your struct without risk to break all things around tuples. It worth to note that Erlang uses tuples as collections for internal representation of array, gb_trees, gb_sets, etc.
iex(1)> :array.from_list ['a', 'b', 'c']
{:array, 3, 10, :undefined,
{'a', 'b', 'c', :undefined, :undefined, :undefined, :undefined, :undefined,
:undefined, :undefined}}
Not sure if there is any other technical reason not to use tuples as collections. If somebody can provide another good explanation for the conflict between the Record and the Enumerable protocol, he is welcome to improve this answer.

As you are sure you need to use tuples there, you might achieve the requested functionality at a cost of compilation time. The solution below will be compiling for long (consider ≈100s for #max_items 1000.) Once compiled the execution time would gladden you. The same approach is used in Elixir core to build up-to-date UTF-8 string matchers.
defmodule Tuple.Enumerable do
defimpl Enumerable, for: Tuple do
#max_items 1000
def count(tuple), do: tuple_size(tuple)
def member?(_, _), do: false # for the sake of compiling time
def reduce(tuple, acc, fun), do: do_reduce(tuple, acc, fun)
defp do_reduce(_, {:halt, acc}, _fun), do: {:halted, acc}
defp do_reduce(tuple, {:suspend, acc}, fun) do
{:suspended, acc, &do_reduce(tuple, &1, fun)}
end
defp do_reduce({}, {:cont, acc}, _fun), do: {:done, acc}
defp do_reduce({value}, {:cont, acc}, fun) do
do_reduce({}, fun.(value, acc), fun)
end
Enum.each(1..#max_items-1, fn tot ->
tail = Enum.join(Enum.map(1..tot, & "e_★_#{&1}"), ",")
match = Enum.join(["value"] ++ [tail], ",")
Code.eval_string(
"defp do_reduce({#{match}}, {:cont, acc}, fun) do
do_reduce({#{tail}}, fun.(value, acc), fun)
end", [], __ENV__
)
end)
defp do_reduce(huge, {:cont, _}, _) do
raise Protocol.UndefinedError,
description: "too huge #{tuple_size(huge)} > #{#max_items}",
protocol: Enumerable,
value: Tuple
end
end
end
Enum.each({:a, :b, :c}, fn e -> IO.puts "Iterating: #{e}" end)
#⇒ Iterating: a
#  Iterating: b
#  Iterating: c
The code above explicitly avoids the implementation of member?, since it would take even more time to compile while you have requested the iteration only.

Related

Did the Designers of Rust ever publicly say why the syntax of indexing an array is different to indexing a tuple?

In Rust a tuple can be indexed using a dot (e.g.: x.0), while an array can be indexed with square brackets (e.g.: x[0]). At first glance this seems to me as if it would make it harder to refactor existing code, without serving any actual purpose. However, I am probably just missing something. Did the creators of Rust ever comment on this and told us why they chose to build the language that way?
This tuple field access syntax was introduced in RFC 184 (discussion thread). Before that, you had to destructure all tuples (and tuple structs) to access their values, or use special traits in the standard library.
The RFC itself does not go into a lot of detail regarding the alternative [index] syntax, but the discussion thread does. I see three main reasons why we ended up with the .index syntax:
[] is related to std::ops::Index
The [index] syntax is strongly related to the ops::Index trait. It allows you to overload that operator for your own types. The way the trait is designed, the index method (which is called when you use []) has to return that same type every time. So ops::Index cannot be used for heterogeneous types. And since [] is very related to the trait, it might be strange to have a few special usages of [] that don't use std::ops::Index.
As also pointed out on reddit, indexing as such (i.e. tuple[0], tuple[1] etc.) wouldn't make sense as an alternative, because tuples are heterogenous. (They definitely couldn't be made to implement the Index* traits.)
— Comment
Indexing syntax [is] actually a really bad fit. Notably, indexing syntax everywhere else has a consistent type, but a tuple is heterogenous so a[0] and a[1] would have different types.
— Comment
Tuples/tuple structs as structs with anonymous fields
Rust has other heterogeneous data types: structs. And you access their fields with the .field syntax. And describing tuple structs as structs with unnamed fields, and describing tuples as unnamed tuple structs, it makes sense to treat both a bit like structs. Then, the .0 just feels like referencing an unnamed field.
I feel, especially in statically-typed languages, the types of a[1], a[2], ..., a[N] should be the same.
Yes, that's why nobody is advocating for adding indexing syntax to tuples. The tuple.1 syntax is a much better fit; it's basically anonymous field access, rather than indexing.
— Comment
Tuples and tuple structs are just structs with anonymous fields ordered by their definition, so both should support syntax for accessing a field as an lvalue directly to make working with them more consistent and easier.
— Comment
Influenced by Swift
Swift already had that syntax and seems to have been an influence:
For reference, Swift allows this tuple indexing syntax, and allows for assignment with it.
— Comment
+1 swift has a lot of nice pragmatic tweaks, and this is one of them, IMO.
— Comment
thanks to swift there's going to be a large community familiar with the .0 .1 ... notation
— Comment
As an aside: as can be seen by this thread, the "designers of Rust" really are, for the most part, just the community members. Sure, the RFC process had its problems back then (and it's still not perfect), but you can read most of that discussion online, with community members commenting on the proposed syntax.
I haven't seen any explicit mention of the reasoning behind the decision but it makes sense considering the original meaning of the indexing syntax. Dating back to C, arrays were contiguous meaning they stored data such that each byte of data began as the previous one left off. When you accessed the Nth element of an array, you were really accessing the data which started N * sizeof(type) bytes away from the start of the array.
---------------------
| 0 | 1 | 2 | 3 | 4 |
---------------------
This method only works when each value in the array is of the same type and therefore uses the same amount of memory, as is the case with Rust arrays and slices but not necessarily so with the tuple type. Had the designers allowed access to tuple data with indices it may have caused those learning Rust to make incorrect assumptions about the use of tuples.
Tuples are generally used to make simple "data classes" which only store data and do not need any associated methods. So you should consider a tuple a struct where all fields are public and accessed by number.

How to access an element in 2d array in Smalltalk

I started coding in Smalltalk and got stuck here. I have this 2d array:
testArr := Array new: 1.
testArr at: 1
put: ((Array new: 3)
at: 1 put: '1A';
at: 2 put: '1B';
at: 3 put: '1C';
yourself).
But if I want to access lets say first element of first array, what should I write to make it happen?
Thanks!
I'm tempted to take advantage of your question & answer, and elaborate a little bit on knowledge that belongs in the Smalltalk folklore (which you may already be aware of).
As we progresses in the use of Smalltalk, we will likely note that the Array class starts to play a decreasing role in our models. Why is that? Because it takes time to find out which are the objects our models will produce; the balance between too many and too few is a delicate matter, mostly clueless at the beginning.
Arrays and their composition are handy data structures. However, they solve the problem of organizing data at the expense of dealing with it. If the client of such a structure needs to know how data is stored as a prerequisite to act on it, then the message paradigm becomes semantically idle.
Let's imagine a matrix object. There are several ways to keep their entries: a one-dimensional array, an array of rows, an array of columns, a dictionary of (sparse) non-null entries, a triangular structure if the matrices are known to be symmetric/anti-symmetric/Hermitian, and a lot more for special cases. Of course, this variety makes no sense for the problem at hand and, in any case, it would be a bad idea to spend time considering the most general approach: In Smalltalk, generality is attained at the message, not at the storage.
Regardless of the internal organization of data, our objects should always offer protocols that are independent of the underlying structure. Back to the matrix example, even if our initial organization is an array of rows, the matrix object should work the same whether rows are arrays or more sophisticated vector objects that are also used for other ends. This means that when coding the internal access to entry (i,j) we should pretend we don't know the class of row i but only the message to access its jth element. Something on the lines of
atRow: i column: j
| row |
row := self row: i.
^row at: j
Here we are not assuming that row is an Array; we are only assuming that it understands the at: message, which is the least we can assume when talking to a row object, whatever its actual nature. Of course, this code is only good under the assumption that rows are not recreated on the fly, as they would, had our class kept instead collections of columns. But this is ok, otherwise we would only need to add another class and override this and some few other low level messages.
In any case, the idea is to defer as much as possible any explicit knowledge about the internal organization so that it gets confined to a few private messages. One way to test we are applying this good practice is to make sure no low level code repeats in two or more methods. For instance, the use of the row: message above moves low level code away from atRow:column:, deferring it to another that makes sense for the (ideal) matrix protocol.
This example illustrates an important point: be suspicious of any code that needs to compose two at: messages. And --why not-- enjoy the beauty of not having to declare types.
So, the problem was in brackets.
^(testArr at: 1) at:1
returns
1A
as I needed.

Why are lists used infrequently in Go?

Is there a way to create an array/slice in Go without a hard-coded array size? Why is List ignored?
In all the languages I've worked with extensively: Delphi, C#, C++, Python - Lists are very important because they can be dynamically resized, as opposed to arrays.
In Golang, there is indeed a list.Liststruct, but I see very little documentation about it - whether in Go By Example or the three Go books that I have - Summerfield, Chisnal and Balbaert - they all spend a lot of time on arrays and slices and then skip to maps. In souce code examples I also find little or no use of list.List.
It also appears that, unlike Python, Range is not supported for List - big drawback IMO. Am I missing something?
Slices are lovely, but they still need to be based on an array with a hard-coded size. That's where List comes in.
Just about always when you are thinking of a list - use a slice instead in Go. Slices are dynamically re-sized. Underlying them is a contiguous slice of memory which can change size.
They are very flexible as you'll see if you read the SliceTricks wiki page.
Here is an excerpt :-
Copy
b = make([]T, len(a))
copy(b, a) // or b = append([]T(nil), a...)
Cut
a = append(a[:i], a[j:]...)
Delete
a = append(a[:i], a[i+1:]...) // or a = a[:i+copy(a[i:], a[i+1:])]
Delete without preserving order
a[i], a = a[len(a)-1], a[:len(a)-1]
Pop
x, a = a[len(a)-1], a[:len(a)-1]
Push
a = append(a, x)
Update: Here is a link to a blog post all about slices from the go team itself, which does a good job of explaining the relationship between slices and arrays and slice internals.
I asked this question a few months ago, when I first started investigating Go. Since then, every day I have been reading about Go, and coding in Go.
Because I did not receive a clear-cut answer to this question (although I had accepted one answer) I'm now going to answer it myself, based on what I have learned, since I asked it:
Is there a way to create an array /slice in Go without a hard coded
array size?
Yes. Slices do not require a hard coded array to slice from:
var sl []int = make([]int, len, cap)
This code allocates slice sl, of size len with a capacity of cap - len and cap are variables that can be assigned at runtime.
Why is list.List ignored?
It appears the main reasons list.List seem to get little attention in Go are:
As has been explained in #Nick Craig-Wood's answer, there is
virtually nothing that can be done with lists that cannot be done
with slices, often more efficiently and with a cleaner, more
elegant syntax. For example the range construct:
for i := range sl {
sl[i] = i
}
cannot be used with list - a C style for loop is required. And in
many cases, C++ collection style syntax must be used with lists:
push_back etc.
Perhaps more importantly, list.List is not strongly typed - it is very similar to Python's lists and dictionaries, which allow for mixing various types together in the collection. This seems to run contrary
to the Go approach to things. Go is a very strongly typed language - for example, implicit type conversions never allowed in Go, even an upCast from int to int64 must be
explicit. But all the methods for list.List take empty interfaces -
anything goes.
One of the reasons that I abandoned Python and moved to Go is because
of this sort of weakness in Python's type system, although Python
claims to be "strongly typed" (IMO it isn't). Go'slist.Listseems to
be a sort of "mongrel", born of C++'s vector<T> and Python's
List(), and is perhaps a bit out of place in Go itself.
It would not surprise me if at some point in the not too distant future, we find list.List deprecated in Go, although perhaps it will remain, to accommodate those rare situations where, even using good design practices, a problem can best be solved with a collection that holds various types. Or perhaps it's there to provide a "bridge" for C family developers to get comfortable with Go before they learn the nuances of slices, which are unique to Go, AFAIK. (In some respects slices seem similar to stream classes in C++ or Delphi, but not entirely.)
Although coming from a Delphi/C++/Python background, in my initial exposure to Go I found list.List to be more familiar than Go's slices, as I have become more comfortable with Go, I have gone back and changed all my lists to slices. I haven't found anything yet that slice and/or map do not allow me to do, such that I need to use list.List.
I think that's because there's not much to say about them as the container/list package is rather self-explanatory once you absorbed what is the chief Go idiom for working with generic data.
In Delphi (without generics) or in C you would store pointers or TObjects in the list, and then cast them back to their real types when obtaining from the list. In C++ STL lists are templates and hence parameterized by type, and in C# (these days) lists are generic.
In Go, container/list stores values of type interface{} which is a special type capable to represent values of any other (real) type—by storing a pair of pointers: one to the type info of the contained value, and a pointer to the value (or the value directly, if it's size is no greater than the size of a pointer). So when you want to add an element to the list, you just do that as function parameters of type interface{} accept values coo any type. But when you extract values from the list, and what to work with their real types you have to either type-asert them or do a type switch on them—both approaches are just different ways to do essentially the same thing.
Here is an example taken from here:
package main
import ("fmt" ; "container/list")
func main() {
var x list.List
x.PushBack(1)
x.PushBack(2)
x.PushBack(3)
for e := x.Front(); e != nil; e=e.Next() {
fmt.Println(e.Value.(int))
}
}
Here we obtain an element's value using e.Value() and then type-assert it as int a type of the original inserted value.
You can read up on type assertions and type switches in "Effective Go" or any other introduction book. The container/list package's documentation summaries all the methods lists support.
Note that Go slices can be expanded via the append() builtin function. While this will sometimes require making a copy of the backing array, it won't happen every time, since Go will over-size the new array giving it a larger capacity than the reported length. This means that a subsequent append operation can be completed without another data copy.
While you do end up with more data copies than with equivalent code implemented with linked lists, you remove the need to allocate elements in the list individually and the need to update the Next pointers. For many uses the array based implementation provides better or good enough performance, so that is what is emphasised in the language. Interestingly, Python's standard list type is also array backed and has similar performance characteristics when appending values.
That said, there are cases where linked lists are a better choice (e.g. when you need to insert or remove elements from the start/middle of a long list), and that is why a standard library implementation is provided. I guess they didn't add any special language features to work with them because these cases are less common than those where slices are used.
From: https://groups.google.com/forum/#!msg/golang-nuts/mPKCoYNwsoU/tLefhE7tQjMJ
It depends a lot on the number of elements in your lists,
whether a true list or a slice will be more efficient
when you need to do many deletions in the 'middle' of the list.
#1
The more elements, the less attractive a slice becomes.
#2
When the ordering of the elements isn't important,
it is most efficient to use a slice and
deleting an element by replacing it by the last element in the slice and
reslicing the slice to shrink the len by 1
(as explained in the SliceTricks wiki)
So
use slice
1. If order of elements in list is Not important, and you need to delete, just
use List swap the element to delete with last element, & re-slice to (length-1)
2. when elements are more (whatever more means)
There are ways to mitigate the deletion problem --
e.g. the swap trick you mentioned or
just marking the elements as logically deleted.
But it's impossible to mitigate the problem of slowness of walking linked lists.
So
use slice
1. If you need speed in traversal
Unless the slice is updated way too often (delete, add elements at random locations) the memory contiguity of slices will offer excellent cache hit ratio compared to linked lists.
Scott Meyer's talk on the importance of cache..
https://www.youtube.com/watch?v=WDIkqP4JbkE
list.List is implemented as a doubly linked list. Array-based lists (vectors in C++, or slices in golang) are better choice than linked lists in most conditions if you don't frequently insert into the middle of the list. The amortized time complexity for append is O(1) for both array list and linked list even though array list has to extend the capacity and copy over existing values. Array lists have faster random access, smaller memory footprint, and more importantly friendly to garbage collector because of no pointers inside the data structure.

In Lua, how should I handle a zero-based array index which comes from C?

Within C code, I have an array and a zero-based index used to lookup within it, for example:
char * names[] = {"Apple", "Banana", "Carrot"};
char * name = names[index];
From an embedded Lua script, I have access to index via a getIndex() function and would like to replicate the array lookup. Is there an agreed on "best" method for doing this, given Lua's one-based arrays?
For example, I could create a Lua array with the same contents as my C array, but this would require adding 1 when indexing:
names = {"Apple", "Banana", "Carrot"}
name = names[getIndex() + 1]
Or, I could avoid the need to add 1 by using a more complex table, but this would break things like #names:
names = {[0] = "Apple", "Banana", "Carrot"}
name = names[getIndex()]
What approach is recommended?
Edit: Thank you for the answers so far. Unfortunately the solution of adding 1 to the index within the getIndex function is not always applicable. This is because in some cases indices are "well-known" - that is, it may be documented that an index of 0 means "Apple" and so on. In that situation, should one or the other of the above solutions be preferred, or is there a better alternative?
Edit 2: Thanks again for the answers and comments, they have really helped me think about this issue. I have realized that there may be two different scenarios in which the problem occurs, and the ideal solution may be different for each.
In the first case consider, for example, an array which may differ from time to time and an index which is simply relative to the current array. Indices have no meaning outside the code. Doug Currie and RBerteig are absolutely correct: the array should be 1-based and getIndex should contain a +1. As was mentioned, this allows the code on both the C and Lua sides to be idiomatic.
The second case involves indices which have meaning, and probably an array which is always the same. An extreme example would be where names contains "Zero", "One", "Two". In this case, the expected value for each index is well-known, and I feel that making the index on the Lua side one-based is unintuitive. I believe one of the other approaches should be preferred.
Use 1-based Lua tables, and bury the + 1 inside the getIndex function.
I prefer
names = {[0] = "Apple", "Banana", "Carrot"}
name = names[getIndex()]
Some of table-manipulation features - #, insert, remove, sort - are broken.
Others - concat(t, sep, 0), unpack(t, 0) - require explicit starting index to run correctly:
print(table.concat(names, ',', 0)) --> Apple,Banana,Carrot
print(unpack(names, 0)) --> Apple Banana Carrot
I hate constantly remembering of that +1 to cater Lua's default 1-based indices style.
You code should reflect your domain specific indices to be more readable.
If 0-based indices are fit well for your task, you should use 0-based indices in Lua.
I like how array indices are implemented in Pascal: you are absolutely free to choose any range you want, e.g., array[-10..-5]of byte is absolutely OK for an array of 6 elements.
This is where Lua metemethods and metatables come in handy. Using a table proxy and a couple metamethods, you can modify access to the table in a way that would fit your need.
local names = {"Apple", "Banana", "Carrot"} -- Original Table
local _names = names -- Keep private access to the table
local names = {} -- Proxy table, used to capture all accesses to the original table
local mt = {
__index = function (t,k)
return _names[k+1] -- Access the original table
end,
__newindex = function (t,k,v)
_names[k+1] = v -- Update original table
end
}
setmetatable(names, mt)
So what's going on here, is that the original table has a proxy for itself, then the proxy catches every access attempt at the table. When the table is accessed, it increment the value it was accessed by, simulating a 0-based array. Here are the print result:
print(names[0]) --> Apple
print(names[1]) --> Banana
print(names[2]) --> Carrot
print(names[3]) --> nil
names[3] = "Orange" --Add a new field to the table
print(names[3]) --> Orange
All table operations act just as they would normally. With this method you don't have to worry about messing with any unordinary access to the table.
EDIT: I'd like to point out that the new "names" table is merely a proxy to access the original names table. So if you queried for #names the result would be nil because that table itself has no values. You'd need to query for #_names to access the size of the original table.
EDIT 2: As Charles Stewart pointed out in the comment below, you can add a __len metamethod to the mt table to ensure the #names call gives you the correct results.
First of all, this situation is not unique to applications that mix Lua and C; you can face the same question even when using Lua only apps. To provide an example, I'm using an editor component that indexes lines starting from 0 (yes, it's C-based, but I only use its Lua interface), but the lines in the script that I edit in the editor are 1-based. So, if the user sets a breakpoint on line 3 (starting from 0 in the editor), I need to send a command to the debugger to set it on line 4 in the script (and convert back when the breakpoint is hit).
Now the suggestions.
(1) I personally dislike using [0] hack for arrays as it breaks too many things. You and Egor already listed many of them; most importantly for me it breaks # and ipairs.
(2) When using 1-based arrays I try to avoid indexing them and to use iterators as much as possible: for i, v in ipairs(...) do instead of for i = 1, #array do).
(3) I also try to isolate my code that deals with these conversions; for example, if you are converting between lines in the editor to manage markers and lines in the script, then have marker2script and script2marker functions that do the conversion (even if it's simple +1 and -1 operations). You'd have something like this anyway even without +1/-1 adjustments, it would just be implicit.
(4) If you can't hide the conversion (and I agree, +1 may look ugly), then make it even more noticeable: use c2l and l2c calls that do the conversion. In my opinion it's not as ugly as +1/-1, but has the advantage of communicating the intent and also gives you an easy way to search for all the places where the conversion happens. It's very useful when you are looking for off-one bugs or when API changes cause updates to this logic.
Overall, I wouldn't worry about these aspects too much. I'm working on a fairly complex Lua app that wraps several 0-based C components and don't remember any issues caused by different indexing...
Why not just turn the C-array into a 1-based array as well?
char * names[] = {NULL, "Apple", "Banana", "Carrot"};
char * name = names[index];
Frankly, this will lead to some unintuitive code on the C-side, but if you insist that there must be 'well-known' indices that work in both sides, this seems to be the best option.
A cleaner solution is of course not to make those 'well-known' indices part of the interface. For example, you could use named identifiers instead of plain numbers. Enums are a nice match for this on the C side, while in Lua you could even use strings as table keys.
Another possibility is to encapsulate the table behind an interface so that the user never accesses the array directly but only via a C-function call, which can then perform arbitrarily complex index transformations. Then you only need to expose that C function in Lua and you have a clean and maintainable solution.
Why not present your C array to Lua as userdata? The technique is described with code in PiL, section 'Userdata'; you can set the __index, __newindex, and __len metatable methods, and you can inherit from a class to provide other sequence manipulation functions as regular methods (e.g., define an array with array.remove, array.sort, array.pairs functions, which can be defined as object methods by a further tweak to __index). Doing things this way means you have no "synchronisation" issues between Lua and C, and it avoids risks that "array" tables get treated as ordinary tables resulting in off-by-one errors.
You can fix this lua-flaw by using an iterator that is aware of different index bases:
function iarray(a)
local n = 0
local s = #a
if a[0] ~= nil then
n = -1
end
return function()
n = n + 1
if n <= s then return n,a[n] end
end
end
However, you still have to add the zeroth element manually:
Usage example:
myArray = {1,2,3,4,5}
myArray[0] = 0
for _,e in iarray(myArray) do
-- do something with element e
end

What is the actual definition of an array? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Arrays, What’s the point?
I tried to ask this question before in What is the difference between an array and a list? but my question was closed before reaching a conclusive answer (more about that).
I'm trying to understand what is really meant by the word "array" in computer science. I am trying to reach an answer not have a discussion as per the spirit of this website. What I'm asking is language agnostic but you may draw on your knowledge of what arrays are/do in various languages that you've used.
Ways of thinking about this question:
Imagine you're designing a new programming language and you decide to implement arrays in it; what does that mean they do? What will the properties and capabilities of those things be. If it depends on the type of language, how so?
What makes an array an array?
When is an array not an array? When it is, for example, a list, vector, table, map, or collection?
It's possible there isn't one precise definition of what an array is, if that is the case then are there any standard or near-standard assumptions or what an array is? Are there any common areas at least? Maybe there are several definitions, if that is the case I'm looking for the most precision in each of them.
Language examples:
(Correct me if I'm wrong on any of these).
C arrays are contiguous blocks of memory of a single type that can be traversed using pointer arithmetic or accessed at a specific offset point. They have a fixed size.
Arrays in JavaScript, Ruby, and PHP, have a variable size and can store an object/scalar of any type they can also grow or have elements removed from them.
PHP arrays come in two types: numeric and associative. Associative arrays have elements that are stored and retrieved with string keys. Numeric arrays have elements that are stored and retrieved with integers. Interestingly if you have: $eg = array('a', 'b', 'c') and you unset($eg[1]) you still retrieve 'c' with $eg[2], only now $eg[1] is undefined. (You can call array_values() to re-index the array). You can also mix string and integer keys.
At this stage of sort of suspecting that C arrays are the only true array here and that strictly-speaking for an array to be an array it has to have all the characteristics I mention in that first bullet point. If that's the case then — again these are suspicions that I'm looking to have confirmed or rejected — arrays in JS and Ruby are actually vectors, and PHP arrays are probably tables of some kind.
Final note: I've made this community wiki so if answers need to be edited a few times in lieu of comments, go ahead and do that. Consensus is in order here.
It is, or should be, all about abstraction
There is actually a good question hidden in there, a really good one, and it brings up a language pet peeve I have had for a long time.
And it's getting worse, not better.
OK: there is something lowly and widely disrespected Fortran got right that my favorite languages like Ruby still get wrong: they use different syntax for function calls, arrays, and attributes. Exactly how abstract is that? In fortran function(1) has the same syntax as array(1), so you can change one to the other without altering the program. (I know, not for assignments, and in the case of Fortran it was probably an accident of goofy punch card character sets and not anything deliberate.)
The point is, I'm really not sure that x.y, x[y], and x(y) should have different syntax. What is the benefit of attaching a particular abstraction to a specific syntax? To make more jobs for IDE programmers working on refactoring transformations?
Having said all that, it's easy to define array. In its first normal form, it's a contiguous sequence of elements in memory accessed via a numeric offset and using a language-specific syntax. In higher normal forms it is an attribute of an object that responds to a typically-numeric message.
array |əˈrā|
noun
1 an impressive display or range of a particular type of thing : there is a vast array of literature on the topic | a bewildering array of choices.
2 an ordered arrangement, in particular
an arrangement of troops.
Mathematics: an arrangement of quantities or symbols in rows and columns; a matrix.
Computing: an ordered set of related elements.
Law: a list of jurors empaneled.
3 poetic/literary elaborate or beautiful clothing : he was clothed in fine array.
verb
[ trans. ] (usu. be arrayed) display or arrange (things) in a particular way : arrayed across the table was a buffet | the forces arrayed against him.
[ trans. ] (usu. be arrayed in) dress someone in (the clothes specified) : they were arrayed in Hungarian national dress.
[ trans. ] Law empanel (a jury).
ORIGIN Middle English (in the senses [preparedness] and [place in readiness] ): from Old French arei (noun), areer (verb), based on Latin ad- ‘toward’ + a Germanic base meaning ‘prepare.’
From FOLDOC:
array
1. <programming> A collection of identically typed data items
distinguished by their indices (or "subscripts"). The number
of dimensions an array can have depends on the language but is
usually unlimited.
An array is a kind of aggregate data type. A single
ordinary variable (a "scalar") could be considered as a
zero-dimensional array. A one-dimensional array is also known
as a "vector".
A reference to an array element is written something like
A[i,j,k] where A is the array name and i, j and k are the
indices. The C language is peculiar in that each index is
written in separate brackets, e.g. A[i][j][k]. This expresses
the fact that, in C, an N-dimensional array is actually a
vector, each of whose elements is an N-1 dimensional array.
Elements of an array are usually stored contiguously.
Languages differ as to whether the leftmost or rightmost index
varies most rapidly, i.e. whether each row is stored
contiguously or each column (for a 2D array).
Arrays are appropriate for storing data which must be accessed
in an unpredictable order, in contrast to lists which are
best when accessed sequentially. Array indices are
integers, usually natural numbers, whereas the elements of
an associative array are identified by strings.
2. <architecture> A processor array, not to be confused with
an array processor.
Also note that in some languages, when they say "array" they actually mean "associative array":
associative array
<programming> (Or "hash", "map", "dictionary") An array
where the indices are not just integers but may be
arbitrary strings.
awk and its descendants (e.g. Perl) have associative
arrays which are implemented using hash coding for faster
look-up.
If you ignore how programming languages model arrays and lists, and ignore the implementation details (and consequent performance characteristics) of the abstractions, then the concepts of array and list are indistinguishable.
If you introduce implementation details (still independent of programming language) you can compare data structures like linked lists, array lists, regular arrays, sparse arrays and so on. But then you are not longer comparing arrays and lists per se.
The way I see it, you can only talk about a distinction between arrays and lists in the context of a programming language. And of course you are then talking about arrays and lists as supported by that language. You cannot generalize to any other language.
In short, I think this question is based on a false premise, and has no useful answer.
EDIT: in response to Ollie's comments:
I'm not saying that it is not useful to use the words "array" and "list". What I'm saying is the words do not and cannot have precise and distinct definitions ... except in the context of a specific programming language. While you would like the two words to have distinct meaning, it is a fact that they don't. Just take a look at the way the words are actually used. Furthermore, trying to impose a new set of definitions on the world is doomed to fail.
My point about implementation is that when we compare and contrast the different implementations of arrays and lists, we are doing just that. I'm not saying that it is not a useful thing to do. What I am saying is that when we compare and contrast the various implementations we should not get all hung up about whether we call them arrays or lists or whatever. Rather we should use terms that we can agree on ... or not use terms at all.
To me, "array" means "ordered collection of things that is probably efficiently indexable" and "list" means "ordered collection of things that may be efficiently indexable". But there are examples of both arrays and lists that go against the trend; e.g. PHP arrays on the one hand, and Java ArrayLists on the other hand. So if I want to be precise ... in a language-agnostic context, I have to talk about "C-like arrays" or "linked lists" or some other terminology that makes it clear what data structure I really mean. The terms "array" and "list" are of no use if I want to be clear.
An array is an ordered collection of data items indexed by integer. It is not possible to be certain of anything more. Vote for this answer you believe this is the only reasonable outcome of this question.
An array:
is a finite collection of elements
the elements are ordered, and this is their only structure
elements of the same type
supported efficient random access
has no expectation of efficient insertions
may or may not support append
(1) differentiates arrays from things like iterators or generators. (2) differentiates arrays from sets. (3) differentiates arrays from things like tuples where you get an int and a string. (4) differentiates arrays from other types of lists. Maybe it's not always true, but a programmer's expectation is that random access is constant time. (5) and (6) are just there to deny additional requirements.
I would argue that a real array stores values in contiguous memory. Anything else is only called an array because it can be used like array, but they aren't really ("arrays" in PHP are definately not actual arrays (non-associative)). Vectors and such are extensions of arrays, adding additional functionality.
an array is a container, and the objects it holds have no any relationships except the order; the objects are stored in a continuous space abstractly (high level, of course low level may continuous too), so you could access them by slot[x,y,z...].
for example, per array[2,3,5,7,1], you could get 5 using slot[2] (slot[3] in some languages).
for a list, a container too, each object (well, each object-holder exactly such as slot or node) it holds has indicators which "point" to other object(s) and this is the main relationship; in general both high or low level the space is not continuous, but may be continuous; so accessing by slot[x,y,z...] is not recommended.
for example, per |-2-3-5-7-1-|, you need to do a travel from first object to 3rd one to get 5.

Resources