Difference between list and arrays - arrays

Are lists and arrays different in python?
Many articles refer to the following as an array: ar = [0]*N of N element. Which is a list. Not sure if these words are used interchangeably in python. And then there is a module called array

The terms "list" and "array" are not interchangeable in either Python or computer science (CS).
In CS an array data structure is defined, as you've already noted, as a contiguous block of data elements all of which are the same size. Python's list does not conform to this definition as each element of a Python list can not only be a completely different data type, but can even be another data structure type such as list, dictionary, or set.
Python stores each element of a list as a separate data object, and the references to those objects are stored in Python's list data type. My past reading indicates that the list is stored as an array, but I've measured the list's time complexity and I suspect that it may technically be a hash table.
Python's lists may be ordered, and you can use and access a Python list as if it were an array as I frequently do, but don't be confused by Python's use of square brackets: it's not an array.
Python's array module implements an actual array data type for storing numerical data (actually, I think you can store character data as well).

Related

Are there Erlang arrays "with a defined representation"?

Context:
Erlang programs running on heterogeneous nodes, retrieving and storing data
from Mnesia databases. These database entries are meant to be used for a long
time (e.g. across multiple Erlang version releases) remains in the form of
Erlang objects (i.e. no serialization). Among the information stored, there are
currently two uses for arrays:
Large (up to 16384 elements) arrays. Fast access to an element
using its index was the basis for choosing this type of collection.
Once the array has been created, the elements are never modified.
Small (up to 64 elements) arrays. Accesses are mostly done using indices, but there are also some iterations (foldl/foldr). Both reading and replacement of the elements is done frequently. The size of the collection remains constant.
Problem:
Erlang's documentation on arrays states that "The representation is not
documented and is subject to change without notice." Clearly, arrays should not be used in my context: database entries containing arrays may be
interpreted differently depending on the node executing the program and
unannounced changes to how arrays are implemented would make them unusable.
I have noticed that Erlang features "ordsets"/"orddict" to address a similar
issue with "sets"/"dict", and am thus looking for the "array" equivalent. Do you know of any? If none exists, my strategy is likely going to be using lists of lists to replace my large arrays, and orddict (with the index as key) to replace the smaller ones. Is there a better solution?
An array is a tuple of nested tuples and integers, with each tuple being a fixed size of 10 and representing a segment of cells. Where a segment is not currently used an integer (10) acts as a place holder. This without the abstraction is I suppose the closet equivalent.You could indeed copy the array module from otp and add to your own app and thus it would be a stable representation.
As to what you should use devoid of array depends on the data and what you will do with it. If data that would be in your array is fixed, then a tuple makes since, it has constant access time for reads/lookups. Otherwise a list sounds like a winner, be it a list of lists, list of tuples, etc. However, once again, that's a shot in the dark, because I don't know your data or how you use it.
See the implementation here: https://github.com/erlang/otp/blob/master/lib/stdlib/src/array.erl
Also see Robert Virding's answer on the implementation of array here: Arrays implementation in erlang
And what Fred Hebert says about the array in A Short Visit to Common Data Structures
An example showing the structure of an array:
1> A1 = array:new(30).
{array,30,0,undefined,100}
2> A2 = array:set(0, true, A1).
{array,30,0,undefined,
{{true,undefined,undefined,undefined,undefined,undefined,
undefined,undefined,undefined,undefined},
10,10,10,10,10,10,10,10,10,10}}
3> A3 = array:set(19, true, A2).
{array,30,0,undefined,
{{true,undefined,undefined,undefined,undefined,undefined,
undefined,undefined,undefined,undefined},
{undefined,undefined,undefined,undefined,undefined,
undefined,undefined,undefined,undefined,true},
10,10,10,10,10,10,10,10,10}}
4>

How come retrieving an element from a list is O(1) [duplicate]

Today in class, we learned that retrieving an element from a list is O(1) in Python. Why is this the case? Suppose I have a list of four items, for example:
li = ["perry", 1, 23.5, "s"]
These items have different sizes in memory. And so it is not possible to take the memory location of li[0] and add three times the size of each element to get the memory location of li[3]. So how does the interpreter know where li[3] is without having to traverse the list in order to retrieve the element?
A list in Python is implemented as an array of pointers1. So, what's really happening when you create the list:
["perry", 1, 23.5, "s"]
is that you are actually creating an array of pointers like so:
[0xa3d25342, 0x635423fa, 0xff243546, 0x2545fade]
Each pointer "points" to the respective objects in memory, so that the string "perry" will be stored at address 0xa3d25342 and the number 1 will be stored at 0x635423fa, etc.
Since all pointers are the same size, the interpreter can in fact add 3 times the size of an element to the address of li[0] to get to the pointer stored at li[3].
1 Get more details from: the horse's mouth (CPython source code on GitHub).
When you say a = [...], a is effectively a pointer to a PyObject containing an array of pointers to PyObjects.
When you ask for a[2], the interpreter first follows the pointer to the list's PyObject, then adds 2 to the address of the array inside it, then returns that pointer. The same happens if you ask for a[0] or a[9999].
Basically, all Python objects are accessed by reference instead of by value, even integer literals like 2. There are just some tricks in the pointer system to keep this all efficient. And pointers have a known size, so they can be stored conveniently in C-style arrays.
Short answer: Python lists are arrays.
Long answer: The computer science term list usually means either a singly-linked list (as used in functional programming) or a doubly-linked list (as used in procedural programming). These data structures support O(1) insertion at either the head of the list (functionally) or at any position that does not need to be searched for (procedurally). A Python ``list'' has none of these characteristics. Instead it supports (amortized) O(1) appending at the end of the list (like a C++ std::vector or Java ArrayList). Python lists are really resizable arrays in CS terms.
The following comment from the Python documentation explains some of the performance characteristics of Python ``lists'':
It is also possible to use a list as a queue, where the first element added is the first element retrieved (“first-in, first-out”); however, lists are not efficient for this purpose. While appends and pops from the end of list are fast, doing inserts or pops from the beginning of a list is slow (because all of the other elements have to be shifted by one).

Data structure - Array

Here it says:
Arrays are useful mostly because the element indices can be computed
at run time. Among other things, this feature allows a single
iterative statement to process arbitrarily many elements of an array.
For that reason, the elements of an array data structure are required
to have the same size and should use the same data representation.
Is this still true for modern languages?
For example, Java, you can have an array of Objects or Strings, right? Each object or string can have different length. Do I misunderstand the above quote, or languages like Java implements Array differently? How?
In java all types except primitives are referenced types meaning they are a pointer to some memory location manipulated by JVM.
But there are mainly two types of programming languages, fixed-typed like Java and C++ and dynamically-typed like python and PHP. In fixed-typed languages your array should consist of the same types whether String, Object or ...
but in dynamically-typed ones there's a bit more abstraction and you can have different data types in array (I don't know the actual implementation though).
An array is a regular arrangement of data in memory. Think of an array of soldiers, all in a line, with exactly equal spacing between each man.
So they can be indexed by lookup from a base address. But all items have to be the same size. So if they are not, you store pointers or references to make them the same size. All languages use that underlying structure, except for what are sometimes called "associative arrays", indexed by key (strings usually), where you have what is called a hash table. Essentially the hash function converts the key into an array index, with a fix-up to resolve collisions.

Python "array-type" data for beginners

I have just started Python coding, after having some experience with scripting languages (BASH + 2 code-based programmes, SAC and FLAC). So I have reasonable understanding of basic code structure, loops and so on. My work so far consist mostly of reorganizing and shufling data bewteen various tables, looking up data from one table based on values from another and so on.
However, I am getting a bit overwhelmed by all the possible treatments of the data and 2D data in particular - lists of lists, numpy arrays, numpy record arrays and so on, each of them with different ways how to load them from a file, access them and modify them.
Do you know of a summary (preferably for dummies) of what are the possible data types and how to treat them, access them and swith between them?
If its google-able, then I haven't done it sufficiently and I appologise.
Cheers
Vhailor
There are three common array types I'll mention here: list and tuple, which are built-in and documented here (along with some others), and numpy.array.
List
Lists are built-in, mutable objects that can store lists, tuples, and numpy arrays. List literals are written with square brackets ([1,2,3,4]), and they can be indexed (starting from zero) with square brackets:
a = [1,2,3,4]
print a[1] # 2
Tuple
Tuples are like lists, but they are written with parentheses ((1,2,3,4)) and are immutable (they can't be modified), but they're faster with some operations than lists.
a = (1,2,3,4)
a[1] += 1 # raises a TypeError
You can convert from a tuple to a list by passing it as an argument to the built-in list() function, and you can convert the other way with tuple().
NumPy array objects
NumPy array objects are not built-in; they are part of NumPy. They're created with numpy.array(), which takes any iterable object (lists and tuples are iterable) and returns a NumPy array object with the same data:
import numpy as np
a = np.array([1,2,3,4])
NumPy arrays are implemented in C and probably faster, and NumPy implements a bunch of useful functions for manipulating them (documented in the docs I linked above).
About saving and loading them, I recently answered a question about saving NumPy arrays, and all of the methods I mentioned there will work with all three of these array types.

Is data storage type different between linked list and array?

Recently I got a problem when sovling the next problem:
What is the difference between a linked list and an array?
A. Search complexity when both are sorted
B. Dynamically add/remove
C. Random access efficiency
D. Data storage type
I know A,B and C are correct, but I feel confused about D. Any help will be appreciated.
Actually there is no difference between them from the perspective of datastorage type.(And maybe it should also be said from the perspective of a statically typed programming language) You can put any struct or any object into them. The key is in most programming languages they are one of types which means you can just store one type on them. However in linked list you store a pointer to the next element. So if you construct your own linked list in a programming language you can put any type into them as long as you keep the pointer to the next element. Whereas in arrays the elements are reached via pointer arithmetic. So they have to be one of type no matter what. So linked list are more flexible from the perceptive of the data storage type.
I think the author means array is stored contiguous in memory, while the linked list is not.

Resources