What is the proper term for the type of array that does not contain textual keys?
That is to say
$my_array[0], $my_array[1] etc. vs $my_array['some-key']
An 'indexed' array? Is there even such a term for this type?
It is simply an array, the associative array you are referring to is a language implementation that will use a different underlying data structure and is not even possible in C. By definition when you are referring to an array by index that is all it is.
In computer science, an array data structure or simply array is a data
structure consisting of a collection of elements (values or
variables), each identified by at least one index.
-Wikipedia
Numerical? Numerically Indexed? I don't think there's an agreed upon term. When you retrieve a php array it can be "ASSOC" for associative or "NUM" for numerical. So that's what I'm guess, someone should know what you mean if you used either of those.
a numerically indexed array: here
I think it's just an array, but w3schools.com uses the term "Numeric Arrays", which would clear up any ambiguity between the two types.
Related
In Rust a tuple can be indexed using a dot (e.g.: x.0), while an array can be indexed with square brackets (e.g.: x[0]). At first glance this seems to me as if it would make it harder to refactor existing code, without serving any actual purpose. However, I am probably just missing something. Did the creators of Rust ever comment on this and told us why they chose to build the language that way?
This tuple field access syntax was introduced in RFC 184 (discussion thread). Before that, you had to destructure all tuples (and tuple structs) to access their values, or use special traits in the standard library.
The RFC itself does not go into a lot of detail regarding the alternative [index] syntax, but the discussion thread does. I see three main reasons why we ended up with the .index syntax:
[] is related to std::ops::Index
The [index] syntax is strongly related to the ops::Index trait. It allows you to overload that operator for your own types. The way the trait is designed, the index method (which is called when you use []) has to return that same type every time. So ops::Index cannot be used for heterogeneous types. And since [] is very related to the trait, it might be strange to have a few special usages of [] that don't use std::ops::Index.
As also pointed out on reddit, indexing as such (i.e. tuple[0], tuple[1] etc.) wouldn't make sense as an alternative, because tuples are heterogenous. (They definitely couldn't be made to implement the Index* traits.)
— Comment
Indexing syntax [is] actually a really bad fit. Notably, indexing syntax everywhere else has a consistent type, but a tuple is heterogenous so a[0] and a[1] would have different types.
— Comment
Tuples/tuple structs as structs with anonymous fields
Rust has other heterogeneous data types: structs. And you access their fields with the .field syntax. And describing tuple structs as structs with unnamed fields, and describing tuples as unnamed tuple structs, it makes sense to treat both a bit like structs. Then, the .0 just feels like referencing an unnamed field.
I feel, especially in statically-typed languages, the types of a[1], a[2], ..., a[N] should be the same.
Yes, that's why nobody is advocating for adding indexing syntax to tuples. The tuple.1 syntax is a much better fit; it's basically anonymous field access, rather than indexing.
— Comment
Tuples and tuple structs are just structs with anonymous fields ordered by their definition, so both should support syntax for accessing a field as an lvalue directly to make working with them more consistent and easier.
— Comment
Influenced by Swift
Swift already had that syntax and seems to have been an influence:
For reference, Swift allows this tuple indexing syntax, and allows for assignment with it.
— Comment
+1 swift has a lot of nice pragmatic tweaks, and this is one of them, IMO.
— Comment
thanks to swift there's going to be a large community familiar with the .0 .1 ... notation
— Comment
As an aside: as can be seen by this thread, the "designers of Rust" really are, for the most part, just the community members. Sure, the RFC process had its problems back then (and it's still not perfect), but you can read most of that discussion online, with community members commenting on the proposed syntax.
I haven't seen any explicit mention of the reasoning behind the decision but it makes sense considering the original meaning of the indexing syntax. Dating back to C, arrays were contiguous meaning they stored data such that each byte of data began as the previous one left off. When you accessed the Nth element of an array, you were really accessing the data which started N * sizeof(type) bytes away from the start of the array.
---------------------
| 0 | 1 | 2 | 3 | 4 |
---------------------
This method only works when each value in the array is of the same type and therefore uses the same amount of memory, as is the case with Rust arrays and slices but not necessarily so with the tuple type. Had the designers allowed access to tuple data with indices it may have caused those learning Rust to make incorrect assumptions about the use of tuples.
Tuples are generally used to make simple "data classes" which only store data and do not need any associated methods. So you should consider a tuple a struct where all fields are public and accessed by number.
Here it says:
Arrays are useful mostly because the element indices can be computed
at run time. Among other things, this feature allows a single
iterative statement to process arbitrarily many elements of an array.
For that reason, the elements of an array data structure are required
to have the same size and should use the same data representation.
Is this still true for modern languages?
For example, Java, you can have an array of Objects or Strings, right? Each object or string can have different length. Do I misunderstand the above quote, or languages like Java implements Array differently? How?
In java all types except primitives are referenced types meaning they are a pointer to some memory location manipulated by JVM.
But there are mainly two types of programming languages, fixed-typed like Java and C++ and dynamically-typed like python and PHP. In fixed-typed languages your array should consist of the same types whether String, Object or ...
but in dynamically-typed ones there's a bit more abstraction and you can have different data types in array (I don't know the actual implementation though).
An array is a regular arrangement of data in memory. Think of an array of soldiers, all in a line, with exactly equal spacing between each man.
So they can be indexed by lookup from a base address. But all items have to be the same size. So if they are not, you store pointers or references to make them the same size. All languages use that underlying structure, except for what are sometimes called "associative arrays", indexed by key (strings usually), where you have what is called a hash table. Essentially the hash function converts the key into an array index, with a fix-up to resolve collisions.
I read over the docs for PostgreSQL v 9.3 arrays (http://www.postgresql.org/docs/9.3/static/arrays.html), but I don't see the question of ordering covered. Can someone confirm that Postgres preserves the insertion order/original order of an array when it's inserted into an array column? This seems to be the case but I would like absolute confirmation.
Thank you.
The documentation is entirely clear that arrays are useful in scenarios where order is important, inasmuch as it explicitly documents querying against specific positions within an array. If those positions were not reliable, these queries would have no meaning. (Using the word "array" is also clear on this point, being as it is a term of the art: An array is an ordered datatype by its nature; an unordered collection allowed to contain duplicates would be a bag, not an array, just as an unordered collection in which duplicates were not allowed would be a set).
See the examples given in section 8.1.4.3, of "pay by quarter", with index position within the array indicating which quarter is being queried against.
Cannot find in documentation, but I'm pretty sure. Yes, order is preserved. And [2,4,5] is different from [5,2,4].
In case I'm wrong, indexes cannot work.
I have a requirement to do a lookup based on a large number. The number could fall in the range 1 - 2^32. Based on the input, i need to return some other data structure. My question is that what data structure should i use to effectively hold this?
I would have used an array giving me O(1) lookup if the numbers were in the range say, 1 to 5000. But when my input number goes large, it becomes unrealistic to use an array as the memory requirements would be huge.
I am hence trying to look at a data structure that yields the result fast and is not very heavy.
Any clues anybody?
EDIT:
It would not make sense to use an array since i may have only 100 or 200 indices to store.
Abhishek
unordered_map or map, depending on what version of C++ you are using.
http://www.cplusplus.com/reference/unordered_map/unordered_map/
http://www.cplusplus.com/reference/map/map/
A simple solution in C, given you've stated at most 200 elements is just an array of structs with an index and a data pointer (or two arrays, one of indices and one of data pointers, where index[i] corresponds to data[i]). Linearly search the array looking for the index you want. With a small number of elements, (200), that will be very fast.
One possibility is a Judy Array, which is a sparse associative array. There is a C Implementation available. I don't have any direct experience of these, although they look interesting and could be worth experimenting with if you have the time.
Another (probably more orthodox) choice is a hash table. Hash tables are data structures which map keys to values, and provide fast lookup and insertion times (provided a good hash function is chosen). One thing they do not provide, however, is ordered traversal.
There are many C implementations. A quick Google search turned up uthash which appears to be suitable, particularly because it allows you to use any value type as the key (many implementations assume a string as the key). In your case you want to use an integer as the key.
This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Arrays, What’s the point?
I tried to ask this question before in What is the difference between an array and a list? but my question was closed before reaching a conclusive answer (more about that).
I'm trying to understand what is really meant by the word "array" in computer science. I am trying to reach an answer not have a discussion as per the spirit of this website. What I'm asking is language agnostic but you may draw on your knowledge of what arrays are/do in various languages that you've used.
Ways of thinking about this question:
Imagine you're designing a new programming language and you decide to implement arrays in it; what does that mean they do? What will the properties and capabilities of those things be. If it depends on the type of language, how so?
What makes an array an array?
When is an array not an array? When it is, for example, a list, vector, table, map, or collection?
It's possible there isn't one precise definition of what an array is, if that is the case then are there any standard or near-standard assumptions or what an array is? Are there any common areas at least? Maybe there are several definitions, if that is the case I'm looking for the most precision in each of them.
Language examples:
(Correct me if I'm wrong on any of these).
C arrays are contiguous blocks of memory of a single type that can be traversed using pointer arithmetic or accessed at a specific offset point. They have a fixed size.
Arrays in JavaScript, Ruby, and PHP, have a variable size and can store an object/scalar of any type they can also grow or have elements removed from them.
PHP arrays come in two types: numeric and associative. Associative arrays have elements that are stored and retrieved with string keys. Numeric arrays have elements that are stored and retrieved with integers. Interestingly if you have: $eg = array('a', 'b', 'c') and you unset($eg[1]) you still retrieve 'c' with $eg[2], only now $eg[1] is undefined. (You can call array_values() to re-index the array). You can also mix string and integer keys.
At this stage of sort of suspecting that C arrays are the only true array here and that strictly-speaking for an array to be an array it has to have all the characteristics I mention in that first bullet point. If that's the case then — again these are suspicions that I'm looking to have confirmed or rejected — arrays in JS and Ruby are actually vectors, and PHP arrays are probably tables of some kind.
Final note: I've made this community wiki so if answers need to be edited a few times in lieu of comments, go ahead and do that. Consensus is in order here.
It is, or should be, all about abstraction
There is actually a good question hidden in there, a really good one, and it brings up a language pet peeve I have had for a long time.
And it's getting worse, not better.
OK: there is something lowly and widely disrespected Fortran got right that my favorite languages like Ruby still get wrong: they use different syntax for function calls, arrays, and attributes. Exactly how abstract is that? In fortran function(1) has the same syntax as array(1), so you can change one to the other without altering the program. (I know, not for assignments, and in the case of Fortran it was probably an accident of goofy punch card character sets and not anything deliberate.)
The point is, I'm really not sure that x.y, x[y], and x(y) should have different syntax. What is the benefit of attaching a particular abstraction to a specific syntax? To make more jobs for IDE programmers working on refactoring transformations?
Having said all that, it's easy to define array. In its first normal form, it's a contiguous sequence of elements in memory accessed via a numeric offset and using a language-specific syntax. In higher normal forms it is an attribute of an object that responds to a typically-numeric message.
array |əˈrā|
noun
1 an impressive display or range of a particular type of thing : there is a vast array of literature on the topic | a bewildering array of choices.
2 an ordered arrangement, in particular
an arrangement of troops.
Mathematics: an arrangement of quantities or symbols in rows and columns; a matrix.
Computing: an ordered set of related elements.
Law: a list of jurors empaneled.
3 poetic/literary elaborate or beautiful clothing : he was clothed in fine array.
verb
[ trans. ] (usu. be arrayed) display or arrange (things) in a particular way : arrayed across the table was a buffet | the forces arrayed against him.
[ trans. ] (usu. be arrayed in) dress someone in (the clothes specified) : they were arrayed in Hungarian national dress.
[ trans. ] Law empanel (a jury).
ORIGIN Middle English (in the senses [preparedness] and [place in readiness] ): from Old French arei (noun), areer (verb), based on Latin ad- ‘toward’ + a Germanic base meaning ‘prepare.’
From FOLDOC:
array
1. <programming> A collection of identically typed data items
distinguished by their indices (or "subscripts"). The number
of dimensions an array can have depends on the language but is
usually unlimited.
An array is a kind of aggregate data type. A single
ordinary variable (a "scalar") could be considered as a
zero-dimensional array. A one-dimensional array is also known
as a "vector".
A reference to an array element is written something like
A[i,j,k] where A is the array name and i, j and k are the
indices. The C language is peculiar in that each index is
written in separate brackets, e.g. A[i][j][k]. This expresses
the fact that, in C, an N-dimensional array is actually a
vector, each of whose elements is an N-1 dimensional array.
Elements of an array are usually stored contiguously.
Languages differ as to whether the leftmost or rightmost index
varies most rapidly, i.e. whether each row is stored
contiguously or each column (for a 2D array).
Arrays are appropriate for storing data which must be accessed
in an unpredictable order, in contrast to lists which are
best when accessed sequentially. Array indices are
integers, usually natural numbers, whereas the elements of
an associative array are identified by strings.
2. <architecture> A processor array, not to be confused with
an array processor.
Also note that in some languages, when they say "array" they actually mean "associative array":
associative array
<programming> (Or "hash", "map", "dictionary") An array
where the indices are not just integers but may be
arbitrary strings.
awk and its descendants (e.g. Perl) have associative
arrays which are implemented using hash coding for faster
look-up.
If you ignore how programming languages model arrays and lists, and ignore the implementation details (and consequent performance characteristics) of the abstractions, then the concepts of array and list are indistinguishable.
If you introduce implementation details (still independent of programming language) you can compare data structures like linked lists, array lists, regular arrays, sparse arrays and so on. But then you are not longer comparing arrays and lists per se.
The way I see it, you can only talk about a distinction between arrays and lists in the context of a programming language. And of course you are then talking about arrays and lists as supported by that language. You cannot generalize to any other language.
In short, I think this question is based on a false premise, and has no useful answer.
EDIT: in response to Ollie's comments:
I'm not saying that it is not useful to use the words "array" and "list". What I'm saying is the words do not and cannot have precise and distinct definitions ... except in the context of a specific programming language. While you would like the two words to have distinct meaning, it is a fact that they don't. Just take a look at the way the words are actually used. Furthermore, trying to impose a new set of definitions on the world is doomed to fail.
My point about implementation is that when we compare and contrast the different implementations of arrays and lists, we are doing just that. I'm not saying that it is not a useful thing to do. What I am saying is that when we compare and contrast the various implementations we should not get all hung up about whether we call them arrays or lists or whatever. Rather we should use terms that we can agree on ... or not use terms at all.
To me, "array" means "ordered collection of things that is probably efficiently indexable" and "list" means "ordered collection of things that may be efficiently indexable". But there are examples of both arrays and lists that go against the trend; e.g. PHP arrays on the one hand, and Java ArrayLists on the other hand. So if I want to be precise ... in a language-agnostic context, I have to talk about "C-like arrays" or "linked lists" or some other terminology that makes it clear what data structure I really mean. The terms "array" and "list" are of no use if I want to be clear.
An array is an ordered collection of data items indexed by integer. It is not possible to be certain of anything more. Vote for this answer you believe this is the only reasonable outcome of this question.
An array:
is a finite collection of elements
the elements are ordered, and this is their only structure
elements of the same type
supported efficient random access
has no expectation of efficient insertions
may or may not support append
(1) differentiates arrays from things like iterators or generators. (2) differentiates arrays from sets. (3) differentiates arrays from things like tuples where you get an int and a string. (4) differentiates arrays from other types of lists. Maybe it's not always true, but a programmer's expectation is that random access is constant time. (5) and (6) are just there to deny additional requirements.
I would argue that a real array stores values in contiguous memory. Anything else is only called an array because it can be used like array, but they aren't really ("arrays" in PHP are definately not actual arrays (non-associative)). Vectors and such are extensions of arrays, adding additional functionality.
an array is a container, and the objects it holds have no any relationships except the order; the objects are stored in a continuous space abstractly (high level, of course low level may continuous too), so you could access them by slot[x,y,z...].
for example, per array[2,3,5,7,1], you could get 5 using slot[2] (slot[3] in some languages).
for a list, a container too, each object (well, each object-holder exactly such as slot or node) it holds has indicators which "point" to other object(s) and this is the main relationship; in general both high or low level the space is not continuous, but may be continuous; so accessing by slot[x,y,z...] is not recommended.
for example, per |-2-3-5-7-1-|, you need to do a travel from first object to 3rd one to get 5.