closest thing to arrays in Elixir - arrays

what is the closest thing to Arrays in Elixir. By arrays I mean, a container for values which I can access in constant time.
I've looked at tuple, but according to the documentation:
Tuples are not meant to be used as a “collection” type (which is also suggested by the absence of an implementation of the Enumerable protocol for tuples): they’re mostly meant to be used as a fixed-size container for multiple elements.
What I actually want to do:
I want to store n processes in an array and periodically pick a random process and send it a message.
I'm open to other suggestions too.

I ended up using a combination of list and registry since I was working with processes. I got many responses on Elixir forum which I'm listing below for future reference:
Tuple: stored continuous in memory, constant access time, editing results in copying whole structure. Does not implement Enumerable protocol.
linked-List: O(n) access time, prefixing is cheaper than suffixing. Implements Enumerable protocol.
Map: O(log n) read, write, delete time. Also implements Enumerable protocol.
:array: array module from Erlang.
registry: (applicable only if storing processes) A local, decentralized and scalable key-value process storage.
Also, note 2 and 3 (List and Map) are persistent data structures.

There are also two Elixir packages Arrays
and Tensor that provide similar functionalities.

Elixir has an array module via erlang: http://erlang.org/doc/man/array.html

Like with mapping in the Solidity language, Elixir has map().

Related

Is the list append feature a feature of the array data structure?

The array data structure has the following features:
Here is the list of most important array features you must know (i.e.
be able to program)
copying and cloning
insertion and deletion
searching and sorting
I am wondering, for the list data type, which can be used for the array data structure, is the append method considered a feature of the array data structure, per the insertion and deletion bullet point?
I would argue that it isn't. I would argue that it is entirely the feature of a list to be able to programmatically append, remove, insertAt, etc. Arrays do not require any functionality other than being a collection of similar types, and in some cases merely a collection of things.
For instance, as referenced in this C article we can see that an array is a collection of similar types. These arrays have no given functionality, and in fact there is no standard, given, way to add or remove to/from them.
Functionally speaking, appending an element to a list is the same as inserting it at the end.
That being said: You seem to have got the concepts of arrays and lists backwards:
A list is typically defined as any kind of data structure which can store an ordered group of things.
An array is something more specific. It's typically defined as a data structure which is made up of a fixed number of objects in memory, stored one after another. Java's array type (e.g. int[]) works this way, for instance.
The web page you are referring to is not helping matters. It's very confusingly written; I'd recommend that you look for another, better reference.

When to use an array vs database

I'm a student and starting to relearn again the basics of programming.
The problem I stated above starts when I have read some Facebook posts that most of the programmers use arrays in their application and arrays are useful. And I started to realize that I never use arrays in my program.
I read some books but they only show the syntax of array and didn't discuss on when to apply them in creating real world applications. I tried to research this on the Internet but I cannot find any. Do you guys have circumstance when you use arrays. Can you please share it to me so I can have an idea.
Also, to clear my doubts can you please explain to me why arrays are good to store information because database can also store information. When is the right time for me to use database and arrays?
I hope to get a clear answer because I have one remaining semester before the internship and I want to clear my head on this. I do not include any specific programming language because I know most of the programming language have arrays.
I hope to get an answer that can I can easily understand.
When is the right time for me to use database and arrays?
I can see how databases and arrays may seem like competing solutions to the same problem, but you're comparing apples and oranges. Arrays are a way to represent structured data in memory. Databases are a tool to store data on disk until you need to retrieve it.
The question you pose is kind of like asking: "When is the right time to use an integer to hold a value, vs a piece of paper?" One of them is a structural representation in memory; the other is a storage tool.
Do you guys have circumstance when you use arrays
In most applications, databases and arrays work together. Applications often retrieve data from a database, and hold it in an array for easy processing. Here is a simple example:
Google allows you to receive an alert when something of interest is mentioned on the news. Let's call it the event. Many people can be interested in the event, so Google needs to keep a list of people to alert. How? Probably in a database.
When the event occurs, what does Google do? Well it needs to:
Retrieve the list of interested users from the DB and place it in an array
Loop through the array and send a notification to each user.
In this example, arrays work really well because users form a collection of similarly shaped data structures that needs to be put through a similar process. That's exactly what arrays are for!
Some other common uses of arrays
A bank wants to send invoice and payment due reminders at the end of the day. So it retrieves the users with past due payments from the DB, and loops through the users' array sending notifications.
An IT admin panel wants to check whether all critical websites in a list are still online. So it loops through the array of domains, pings each one and records the results in a log
An educational program wants to perform statistical functions on student test results. So it puts the results in an array to easily perform operations such as average, sum, standardDev...
Arrays are also awesome at keeping things in a predictable order. You can be certain that as you loop forward through an array, you encounter values in the order you put them in. If you're trying to simulate a checkout line at the store, the customers in a queue are a perfect candidate to represent in an array because:
They are similarly shaped data: each customer has a name, cart contents, wait time, and position in line
They will be put through a similar process: each customer needs methods for enter queue, request checkout, approve payment, reject payment, exit queue
Their order should be consistent: When your program executes next(), you should expect that the next customer in line will be the one at the register, not some customer from the back of the line.
Trying to store the checkout queue in a database doesn't make sense because we want to actively work with the queue while we run our simulation, so we need data in memory. The database can hold a historical record of all customers and their checkout outcomes, perhaps for another program to retrieve and use in another way (maybe build customized statistical reports)
There are two different points. Let's me try to explain the simple way:
Array: container objects to keep a fixed number of values. The array is stored in your memory. So it depends on your requirements but when you need a fixed and fast one, just use array.
Database: when you have a relational data or you would like to store it in somewhere and not really worry about the size of the objects. You can store 10, 100, 1000 records to you DB. It's also flexible and you can select/query/update the data flexible. Simple way to use is: have a relational data, large amount and would like to flexible it, use database.
Hope this help.
There are a number of ways to store data when you have multiple instances of the same type of data. (For example, say you want to keep information on all the people in your city. There would be some sort of object to hold the information on each person, and you might want to have a data structure that holds the information on every person.)
Java has two main ways to store multiple instances of data in memory: arrays and Collections.
Databases are something different. The difference between a database and an array or collection, as I see it, are:
databases are persistent, i.e. the data will stay around after your program has finished running;
databases can be shared between programs, often programs running in all different parts of the world;
databases can be extremely large, much, much larger than could fit in your computer's memory.
Arrays and collections, however, are intended only for use by one program as it runs. Your program may want to keep track of some information in order to do its calculations. But the data will be in your computer's memory, and therefore other programs on other computers won't be able to access it. And when your program is done running, the data is gone. However, since the data is in memory, it's much faster to use it than data in a database, which is stored on some sort of external device. (This is really an overgeneralization, and doesn't consider things like virtual memory and caching. But it's good enough for someone learning the basics.)
The Java run time gives you three basic kinds of collections: sets, lists, and maps. A set is an unordered collection of unique elements; you use that when the data doesn't belong in any particular order, and the main operations you want are to see if something is in the set, or return all the data in the set without caring about the order. A list is ordered, though; the data has a particular order, and provides operations like "get the Nth element" for some number N, and adding to the ends of the list or inserting in a particular place in the list. A map is unordered like a set, but it also attaches keys to the data, so that you can look for data by giving the key. (Again, this is an overgeneralization. Some sets do have order, like SortedSet. And I haven't even gotten into queues, trees, multisets, etc., some of which are in third-party libraries.)
Java provides a List type for ordered lists, but there are several ways to implement it. One is ArrayList. Like all lists, it provides the capability to get the Nth item in the list. But an ArrayList provides this capability faster; under the hood, it's able to go directly to the Nth item. Some other list implementations don't do that--they have to go through the first, second, etc., items, until they get to the Nth.
An array is similar to an ArrayList, but it has a different syntax. For an array x, you can get the Nth element by referring to x[n], while for an ArrayList you'd say x.get(n). As far as functionality goes, the biggest difference is that for an array, you have to know how big it is before you create it, while an ArrayList can grow. So you'd want to use an ArrayList if you don't know beforehand how big your list will be. If you do know, then an array is a little more efficient, I think. Although you can probably get by mostly with just ArrayList, arrays are still fundamental structures in every computer language. The implementation of ArrayList depends on arrays under the hood, for instance.
Think of an array as a book, and database as library. You can't share the book with others at the same time, but you can share a library. You can't put the entire library in one book, but you can checkout 1 book at a time.

How to save R list object to a database?

Suppose I have a list of R objects which are themselves lists. Each list has a defined structure: data, model which fits data and some attributes for identifying data. One example would be time series of certain economic indicators in particular countries. So my list object has the following elements:
data - the historical time series for economic indicator
country - the name of the country, USA for example
name - the indicator name, GDP for example
model - ARIMA orders found out by auto.arima in suitable format, this again may be a list.
This is just an example. As I said suppose I have a number of such objects combined into a list. I would like to save it into some suitable format. The obvious solution is simply to use save, but this does not scale very well for large number of objects. For example if I only wanted to inspect a subset of objects, I would need to load all of the objects into memory.
If my data is a data.frame I could save it to database. If I wanted to work with particular subset of data I would use SELECT and rely on database to deliver the required subset. SQLite served me well in this regard. Is it possible to replicate this for my described list object with some fancy database like MongoDB? Or should I simply think about how to convert my list to several related tables?
My motivation for this is to be able to easily generate various reports on the fitted models. I can write a bunch of functions which produce some report on a given object and then just use lapply on my list of objects. Ideally I would like to parallelise this process, but this is a another problem.
I think I explained the basics of this somewhere once before---the gist of it is that
R has complete serialization and deserialization support built in, so you can in fact take any existing R object and turn it into either a binary or textual serialization. My digest package use that to turn the serialization into hash using different functions
R has all the db connectivity you need.
Now, what a suitable format and db schema is ... will depend on your specifics. But there is (as usual) nothing in R stopping you :)
This question has been inactive for a long time. Since I had a similar concern recently, I want to add the pieces of information that I've found out. I recognise these three demands in the question:
to have the data stored in a suitable structure
scalability in terms of size and access time
the possibility to efficiently read only subsets of the data
Beside the option to use a relational database, one can also use the HDF5 file format which is designed to store a large amount of possible large objects. The choice depends on the type of data and the intended way to access it.
Relational databases should be favoured if:
the atomic data items are small-sized
the different data items possess the same structure
there is no anticipation in which subsets the data will be read out
convenient transfer of the data from one computer to another is not an issue or the computers where the data is needed have access to the database.
The HDF5 format should be preferred if:
the atomic data items are themselves large objects (e.g. matrices)
the data items are heterogenous, it is not possible to combine them into a table like representation
most of the time the data is read out in groups which are known in advance
moving the data from one computer to another should not require much effort
Furthermore, one can distinguish between relational and hierarchial relationships, where the latter is contained in the former. Within a HDF5 file, the information chunks can be arranged in a hierarchial way, e.g.:
/Germany/GDP/model/...
/Germany/GNP/data
/Austria/GNP/model/...
/Austria/GDP/data
The rhdf5 package for handling HDF5 files is available on Bioconductor. General information on the HDF5 format is available here.
Not sure if it is the same, but I had some good experience with time series objects with:
str()
Maybe you can look into that.

Array/list vs Dictionary (why we have them at first place)

To me they are both same and that is why i am wondering why we have dictionary data structure when we can do everything with arrays/list? What is so fancy in dictionaries?
Arraylists just store a set of objects (that can be accessed randomly). Dictionaries store pairs of objects. This makes array/lists more suitable when you have a group of objects in a set (prime numbers, colors, students, etc.). Dictionaries are better suited for showing relationships between a pair of objects.
Why do we need dictionaries? lets say you have some data you need to convert from one form to another, like roman numeral characters to their values. Without dictionaries, you'd have to hack this association together with two arrays, where you first find the position the key is in the first list and access that position in the second. This is terribly error prone and inefficient, and dictionaries provide a more direct approach.
Arrays provide random access of a sequential set of data. Dictionaries (or associative arrays) provide a map from a set of keys to a set of values.
I believe you are comparing apples and oranges - they serve two completely different purposes and are both useful data structures.
Most of the time a dictionary-like type is built as a hash table - this type is very useful as it provides very fast lookups on average (depending on the quality of the hashing algorithm).
The confusion lies in the different naming conventions in different languages. In my understanding, what is called a "Dictionary" in Python is the same as "Associative Array" in PHP.
To build on what Andrew said, in some languages such as PHP and Javascript, the array can also function as a dictionary (known as associative arrays). It also comes down to loose v strict typing in the language.
You could in theory do everything with dictionaries.
But do not forget that at some point the program runs on a real machine which has limitations due to the hardware: processor, memory, nature of the storage (disc/SSD) ...
Behind the scenes the dictionaries are often using a Hash table
In some languages you can choose between many different types of list/array and hash tables as there are many different implementations of those structures, each with advantages and disadvantaged.
Use an array when you work with a sequence of elements or need to randomly access an element at a given index (0, 1, 2, ...)
Use a dictionary when you have key/value format and need fast retrieval via key
If you want to understand more about these I recommend you learn more about data structures as they are fundamental
NOTE: depending on the language the name of those structures may vary and is a source of confusion.

When should I use Scala's Array instead of one of the other collections?

This is more a question of style and preference but here goes: when should I use scala.Array? I use List all the time and occasionally run into Seq, Map and the like, but I've never used nor seen Array in the wild. Is it just there for Java compatibility? Am I missing a common use-case?
First of all, let's make a disclaimer here. Scala 2.7's Array tries to be a Java Array and a Scala Collection at the same time. It mostly succeeds, but fail at both for some corner cases. Unfortunately, these corner cases can happen to good people with normal code, so Scala 2.8 is departing from that.
On Scala 2.8, there's Array, which is Java Array. That means it is a contiguous memory space, which stores either references or primitives (and, therefore, may have different element sizes), and can be randomly accessed pretty fast. It also has lousy methods, an horrible toString implementation, and performs badly when using generics and primitives at the same time (eg: def f[T](a: Array[T]) = ...; f(Array(1,2,3))).
And, then, there is GenericArray, which is a Scala Collection backed by an Array. It always stores boxed primitives, so it doesn't have the performance problems when mixing primitives and generics but, on the other hand, it doesn't have the performance gains of a purely primitive (non-generic) primitive array.
So, when to use what? An Array has the following characteristics:
O(1) random read and write
O(n) append/prepend/insert/delete
mutable
If you don't need generics, or your generics can be stated as [T <: AnyRef], thus excluding primitives, which are AnyVal, and those characteristics are optimal for your code, then go for it.
If you do need generics, including primitives, and those characteristics are optimal for your code, use GenericArray on Scala 2.8. Also, if you want a true Collection, with all of its methods, you may want to use it as well, instead of depending on implicit conversions.
If you want immutability or if you need good performance for append, prepend, insert or delete, look for some other collection.
An array is appropriate when you have a number of items of the same (or compatible) class, and you know in advance the exact count of those items, or a reasonable upper bound, and you're interested in fast random access and perhaps in-place alteration of items, but after setting it up, you will never ever insert or remove items from somewhere in the list.
Or stated in another way, it's an aggregate data structure with less bells and whistles than the Collection types, with slightly less overhead and slightly better performance depending on how it's used.
A very contrived example: You're in the business of producing functions, and quality testing for these functions involves checking their performance or results for a set of 1000 fixed input values. Moreover, you decide not to keep these values in a file, but rather you hard code them into your program. An array would be appropriate.
Interfacing with Java APIs is one case. Also unlike Java arrays scala arrays are invariant and hence doesn't have any advantage over lists because of that.

Resources