Performance of tail function for Scala Arrays - arrays

Scala docs says that the performance of tail for an Array sequence is Linear while head performance is Constant. Since the whole block that contains array elements is brought to cache, I don't expect to see any difference between head and tail for an array. I appreciate if someone explains why tail performance for arrays in Scala is linear.

The tail function creates a new array containing all of the elements except the first. To do this we need to create a copy of the array (minus the first element), which is a linear time operation. As the array gets larger there is more to copy.
Use List instead if you require efficient head and tail operations.
You may be confusing tail with last
head gets the first element: O(1) for List and Array
last gets the last element: O(n) for List, O(1) for Array
tail gets everything except the first: O(1) for List, O(n) for Array
init gets everything except the last: O(n) for List and Array

There is a pretty big difference between lists and arrays. head and tail are the canonical interface to lists, which in Scala, are singly linked lists. head refers to the first thing in the list and tail refers to all of the elements after the first. Since linked lists implement the tail as a pointer, this operation is a constant time operation.
However, things are a little different for arrays. Arrays are used for fast random access and refers to a contiguous block of memory. Scala still exposes the list-like interface of head and tail, but it has to do things a little differently to simulate that. In order to simulate tail, it has to make a new array containing all elements except the first. It has to copy all of its values into a new array, which is a linear time operation.

Related

Use cases of linked list in JavaScript especially in Reactjs /ExpressJS and nosql database [duplicate]

I use a lot of lists and arrays but I have yet to come across a scenario in which the array list couldn't be used just as easily as, if not easier than, the linked list. I was hoping someone could give me some examples of when the linked list is notably better.
Linked lists are preferable over arrays when:
you need constant-time insertions/deletions from the list (such as in real-time computing where time predictability is absolutely critical)
you don't know how many items will be in the list. With arrays, you may need to re-declare and copy memory if the array grows too big
you don't need random access to any elements
you want to be able to insert items in the middle of the list (such as a priority queue)
Arrays are preferable when:
you need indexed/random access to elements
you know the number of elements in the array ahead of time so that you can allocate the correct amount of memory for the array
you need speed when iterating through all the elements in sequence. You can use pointer math on the array to access each element, whereas you need to lookup the node based on the pointer for each element in linked list, which may result in page faults which may result in performance hits.
memory is a concern. Filled arrays take up less memory than linked lists. Each element in the array is just the data. Each linked list node requires the data as well as one (or more) pointers to the other elements in the linked list.
Array Lists (like those in .Net) give you the benefits of arrays, but dynamically allocate resources for you so that you don't need to worry too much about list size and you can delete items at any index without any effort or re-shuffling elements around. Performance-wise, arraylists are slower than raw arrays.
Arrays have O(1) random access, but are really expensive to add stuff onto or remove stuff from.
Linked lists are really cheap to add or remove items anywhere and to iterate, but random access is O(n).
Algorithm ArrayList LinkedList
seek front O(1) O(1)
seek back O(1) O(1)
seek to index O(1) O(N)
insert at front O(N) O(1)
insert at back O(1) O(1)
insert after an item O(N) O(1)
ArrayLists are good for write-once-read-many or appenders, but bad at add/remove from the front or middle.
To add to the other answers, most array list implementations reserve extra capacity at the end of the list so that new elements can be added to the end of the list in O(1) time. When the capacity of an array list is exceeded, a new, larger array is allocated internally, and all the old elements are copied over. Usually, the new array is double the size of the old one. This means that on average, adding new elements to the end of an array list is an O(1) operation in these implementations. So even if you don't know the number of elements in advance, an array list may still be faster than a linked list for adding elements, as long as you are adding them at the end. Obviously, inserting new elements at arbitrary locations in an array list is still an O(n) operation.
Accessing elements in an array list is also faster than a linked list, even if the accesses are sequential. This is because array elements are stored in contiguous memory and can be cached easily. Linked list nodes can potentially be scattered over many different pages.
I would recommend only using a linked list if you know that you're going to be inserting or deleting items at arbitrary locations. Array lists will be faster for pretty much everything else.
The advantage of lists appears if you need to insert items in the middle and don't want to start resizing the array and shifting things around.
You're correct in that this is typically not the case. I've had a few very specific cases like that, but not too many.
It all depends what type of operation you are doing while iterating , all data structures have trade off between time and memory and depending on our needs we should choose the right DS. So there are some cases where LinkedList are faster then array and vice versa . Consider the three basic operation on data structures.
Searching
Since array is index based data structure searching array.get(index) will take O(1) time while linkedlist is not index DS so you will need to traverse up to index , where index <=n , n is size of linked list , so array is faster the linked list when have random access of elements.
Q.So what's the beauty behind this ?
As Arrays are contiguous memory blocks, large chunks of them will be loaded into the cache upon first access this makes it comparatively quick to access remaining elements of the array,as much as we access the elements in array locality of reference also increases thus less catch misses, Cache locality refers to the operations being in the cache and thus execute much faster as compared to in memory,basically In array we maximize the chances of sequential element access being in the cache. While Linked lists aren't necessarily in contiguous blocks of memory, there's no guarantee that items which appear sequentially in the list are actually arranged near each-other in memory, this means fewer cache hits e.g. more cache misses because we need to read from memory for every access of linked list element which increases the time it takes to access them and degraded performance so if we are doing more random access operation aka searching , array will be fast as explained below.
Insertion
This is easy and fast in LinkedList as insertion is O(1) operation in LinkedList (in Java) as compared to array, consider the case when array is full, we need to copy contents to new array if array gets full which makes inserting an element into ArrayList of O(n) in worst case, while ArrayList also needs to update its index if you insert something anywhere except at the end of array , in case of linked list we needn't to be resize it, you just need to update pointers.
Deletion
It works like insertions and better in LinkedList than array.
Those are the most common used implementations of Collection.
ArrayList:
insert/delete at the end generally O(1) worst case O(n)
insert/delete in the middle O(n)
retrieve any position O(1)
LinkedList:
insert/delete in any position O(1) (note if you have a reference to the element)
retrieve in the middle O(n)
retrieve first or last element O(1)
Vector: don't use it. It is an old implementation similar to ArrayList but with all methods synchronized. It is not the correct approach for a shared list in a multithreading environment.
HashMap
insert/delete/retrieve by key in O(1)
TreeSet
insert/delete/contains in O(log N)
HashSet
insert/remove/contains/size in O(1)
In reality memory locality has a huge performance influence in real processing.
The increased use of disk streaming in "big data" processing vs random access shows how structuring your application around this can dramatically improve performance on a larger scale.
If there is any way to access an array sequentially that is by far the best performing. Designing with this as a goal should be at least considered if performance is important.
I think that main difference is whether you frequently need to insert or remove stuff from the top of the list.
With an array, if you remove something from the top of list than the complexity is o(n) because all of the indices of the array elements will have to shift.
With a linked list, it is o(1) because you need only create the node, reassign the head and assign the reference to next as the previous head.
When frequently inserting or removing at the end of the list, arrays are preferable because the complexity will be o(1), no reindexing is required, but for a linked list it will be o(n) because you need to go from the head to the last node.
I think that searching in both linked list and arrays will be o(log n) because you will be probably be using a binary search.
Hmm, Arraylist can be used in cases like follows I guess:
you are not sure how many elements will be present
but you need to access all the elements randomly through indexing
For eg, you need to import and access all elements in a contact list (the size of which is unknown to you)
Use linked list for Radix Sort over arrays and for polynomial operations.
1) As explained above the insert and remove operations give good performance (O(1)) in LinkedList compared to ArrayList(O(n)). Hence if there is a requirement of frequent addition and deletion in application then LinkedList is a best choice.
2) Search (get method) operations are fast in Arraylist (O(1)) but not in LinkedList (O(n)) so If there are less add and remove operations and more search operations requirement, ArrayList would be your best bet.
I did some benchmarking, and found that the list class is actually faster than LinkedList for random inserting:
using System;
using System.Collections.Generic;
using System.Diagnostics;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
int count = 20000;
Random rand = new Random(12345);
Stopwatch watch = Stopwatch.StartNew();
LinkedList<int> ll = new LinkedList<int>();
ll.AddLast(0);
for (int i = 1; i < count; i++)
{
ll.AddBefore(ll.Find(rand.Next(i)),i);
}
Console.WriteLine("LinkedList/Random Add: {0}ms", watch.ElapsedMilliseconds);
watch = Stopwatch.StartNew();
List<int> list = new List<int>();
list.Add(0);
for (int i = 1; i < count; i++)
{
list.Insert(list.IndexOf(rand.Next(i)), i);
}
Console.WriteLine("List/Random Add: {0}ms", watch.ElapsedMilliseconds);
Console.ReadLine();
}
}
}
It takes 900 ms for the linked list and 100ms for the list class.
It creates lists of subsequent integer numbers. Each new integer is inserted after a random number which is already in the list.
Maybe the List class uses something better than just an array.
Arrays, by far, are the most widely used data structures. However, linked lists prove useful in their own unique way where arrays are clumsy - or expensive, to say the least.
Linked lists are useful to implement stacks and queues in situations where their size is subject to vary. Each node in the linked list can be pushed or popped without disturbing the majority of the nodes. Same goes for insertion/deletion of nodes somewhere in the middle. In arrays, however, all the elements have to be shifted, which is an expensive job in terms of execution time.
Binary trees and binary search trees, hash tables, and tries are some of the data structures wherein - at least in C - you need linked lists as a fundamental ingredient for building them up.
However, linked lists should be avoided in situations where it is expected to be able to call any arbitrary element by its index.
A simple answer to the question can be given using these points:
Arrays are to be used when a collection of similar type data elements is required. Whereas, linked list is a collection of mixed type data linked elements known as nodes.
In array, one can visit any element in O(1) time. Whereas, in linked list we would need to traverse entire linked list from head to the required node taking O(n) time.
For arrays, a specific size needs to be declared initially. But linked lists are dynamic in size.

How is a linked list faster than an array for insert and delete operations although it takes O(n) for both data structures?

The worst case running time of Insert and delete operations in an array is O(n), because we might need to make n shifts.
Same is the case for linked list too, if we want to insert or delete ith element we might need to traverse the whole list to reach the position where insert/delete is expected to be done. So Linked list also takes O(n) time.
So why is linked list preferred where insert/delete intensive operations are done.
If you want to insert/delete the ith element in an array, searching only takes O(1) because of indexing. For example u can access the ith element of an array through array[i]. However, inserting/deleting that element, in the worst case, will take O(n) time. For example, if you inserted an element at position 0, you have to shift all the elements one spot to the right, which requires a traversal of the whole array.
If you want to insert/delete the ith element in an linked list, searching will take O(n) in the worst case because you have to keep a count and a pointer while traversing the list one element at a time. Once you arrive at the ith node, inserting/deleting only takes O(1) time since it's just a rearrangement of pointers, no shifting.
As to why linked lists are preferred when there are many inserts/deletions, I would say that one reason is that with linked lists you don't need to know how big it has to be ahead of time. Whereas with arrays, it may have to be resized often in anticipation of more/less elements.
Searching an element in both cases is the same (O(n)). The difference is in inserting and deleting when you are at the specified position. In this case, inserting and deleting is O(1) in a linked list (as you should reset two pointers), but need O(n) in an array (as you need O(n) shifts).
Another difference is in traversing from a position to another position. In a list this traversing take O(n), but in an array it is O(1).
The benefit of the O(1) removal/insert from the linked list is realized when there's an additional data structure that points directly to the nodes. This lets avoid the O(n) cost of the list traversal.
A good example is a bounded size LRU cache where the key-value pairs are represented in a map which also keeps pointers to the linked list. The list represents the access order and here's where the LRU takes advantage of the fast linked list access. Taking an element from the middle and putting it in front is O(1).
Every key access (O(1)) unlinks the associated node from the middle of the list and moves it to the head of the list (in O(1) time). When the cache gets full, the tail node of the list gets removed (it represents the least recently used key/value), together with the key-value pair represented by it.

Arrays and linked List

I want to know about arrays and linked list. which faster if you try to sort elements in arrays and linked list. which list index is faster array or linked list? and last thing if we try to find an element from array and linked list which will take less time to find the respective element?
i know little bit about arrays and linked list. correct me if I am wrong. arrays are fixed size and contiguous memory data structure. while linked list is not fixed size.
From SCJP certification book:
ArrayList:
Think of this as a growable array. It gives you fast iteration and fast
random access. To state the obvious: it is an ordered collection (by index), but not
sorted. You might want to know that as of version 1.4, ArrayList now implements
the new RandomAccess interface—a marker interface (meaning it has no methods)
that says, "this list supports fast (generally constant time) random access." Choose
this over a LinkedList when you need fast iteration but aren't as likely to be doing a
lot of insertion and deletion
LinkedList:
A LinkedList is ordered by index position, like ArrayList, except
that the elements are doubly-linked to one another. This linkage gives you new
methods (beyond what you get from the List interface) for adding and removing
from the beginning or end, which makes it an easy choice for implementing a stack
or queue. Keep in mind that a LinkedList may iterate more slowly than an ArrayList,
but it's a good choice when you need fast insertion and deletion. As of Java 5, the
LinkedList class has been enhanced to implement the java.util.Queue interface. As
such, it now supports the common queue methods:
peek(), poll(), and offer()
ArrayList is represented as an array, however the ArrayList class is doing everything, including resizing array, so you dont have to care about size.
Add to end : constant time
Add to else : linear time (average is n/2 = O(n))
Get : constant
Delete : same as add
LinkedList is represented as linked list. It means each part of linked list has access to the next part and before part.
Add to anywhere : constant time
Delete from anywhere : constant time
Get : linear time (average is n/2 = O(n))
But BOTH are lists, it means they are not fixed. The only difference is when you are using them, some methods are faster/slower than the others compare to another List implementation.
Arrays are faster for all of these operations. The only time that linked lists are faster is when you need to delete or add an element. But for a fixed collection of items, arrays are always faster.

Linked List - Appending node: loop or pointer?

I am writing a linked list datatype and as such I currently have the standard head pointer which references the first item, and then a next pointer for each element that points to the following one such that the final element has next = NULL.
I am just curious what the pros/cons or best practices are for keeping track of the last node. I could have a 'tail' pointer which always points to the last node making it easy to append, or I could loop over the list starting from the head pointer to find the last node when I want to append. Which method is better?
It is usually a good idea to store the tail. If we think about the complexity of adding an item at the end (if this is an operation you commonly do) it will be O(n) time to search for the tail, or O(1) if you store it.
Another option you can consider is to make your list doubly linked. This way when you want to delete the end of the list, by storing tail you can delete nodes in O(1) time. But this will incur an extra pointer to be stored per element of your list (not expensive, but it adds up, and should be a consideration for memory constrained systems).
In the end, it is all about the operations you need to do. If you never add or delete or operate from the end of your list, there is no reason to do this. I recommend analyzing the complexity of your most common operations and base your decision on that.
Depends on how often you need to find the last node, but in general it is best to have a tail pointer.
There's very little cost to just keeping and updating a tail pointer, but you have to remember to update it! If you can keep it updated, then it will make append operations much faster (O(1) instead of O(n)). So, if you usually add elements to the end of the list, then you should absolutely create and maintain a tail pointer.
If you have a doubly linked list, where every element contains a pointer both to the next and prev elements, then a tail pointer is almost universally used.
On the other hand, if this is a sorted list, then you won't be appending to the end, so the tail pointer would never be used. Still, keeping the pointer around is a good idea, just in case you decide you need it in the future.

When to use a linked list over an array/array list?

I use a lot of lists and arrays but I have yet to come across a scenario in which the array list couldn't be used just as easily as, if not easier than, the linked list. I was hoping someone could give me some examples of when the linked list is notably better.
Linked lists are preferable over arrays when:
you need constant-time insertions/deletions from the list (such as in real-time computing where time predictability is absolutely critical)
you don't know how many items will be in the list. With arrays, you may need to re-declare and copy memory if the array grows too big
you don't need random access to any elements
you want to be able to insert items in the middle of the list (such as a priority queue)
Arrays are preferable when:
you need indexed/random access to elements
you know the number of elements in the array ahead of time so that you can allocate the correct amount of memory for the array
you need speed when iterating through all the elements in sequence. You can use pointer math on the array to access each element, whereas you need to lookup the node based on the pointer for each element in linked list, which may result in page faults which may result in performance hits.
memory is a concern. Filled arrays take up less memory than linked lists. Each element in the array is just the data. Each linked list node requires the data as well as one (or more) pointers to the other elements in the linked list.
Array Lists (like those in .Net) give you the benefits of arrays, but dynamically allocate resources for you so that you don't need to worry too much about list size and you can delete items at any index without any effort or re-shuffling elements around. Performance-wise, arraylists are slower than raw arrays.
Arrays have O(1) random access, but are really expensive to add stuff onto or remove stuff from.
Linked lists are really cheap to add or remove items anywhere and to iterate, but random access is O(n).
Algorithm ArrayList LinkedList
seek front O(1) O(1)
seek back O(1) O(1)
seek to index O(1) O(N)
insert at front O(N) O(1)
insert at back O(1) O(1)
insert after an item O(N) O(1)
ArrayLists are good for write-once-read-many or appenders, but bad at add/remove from the front or middle.
To add to the other answers, most array list implementations reserve extra capacity at the end of the list so that new elements can be added to the end of the list in O(1) time. When the capacity of an array list is exceeded, a new, larger array is allocated internally, and all the old elements are copied over. Usually, the new array is double the size of the old one. This means that on average, adding new elements to the end of an array list is an O(1) operation in these implementations. So even if you don't know the number of elements in advance, an array list may still be faster than a linked list for adding elements, as long as you are adding them at the end. Obviously, inserting new elements at arbitrary locations in an array list is still an O(n) operation.
Accessing elements in an array list is also faster than a linked list, even if the accesses are sequential. This is because array elements are stored in contiguous memory and can be cached easily. Linked list nodes can potentially be scattered over many different pages.
I would recommend only using a linked list if you know that you're going to be inserting or deleting items at arbitrary locations. Array lists will be faster for pretty much everything else.
The advantage of lists appears if you need to insert items in the middle and don't want to start resizing the array and shifting things around.
You're correct in that this is typically not the case. I've had a few very specific cases like that, but not too many.
It all depends what type of operation you are doing while iterating , all data structures have trade off between time and memory and depending on our needs we should choose the right DS. So there are some cases where LinkedList are faster then array and vice versa . Consider the three basic operation on data structures.
Searching
Since array is index based data structure searching array.get(index) will take O(1) time while linkedlist is not index DS so you will need to traverse up to index , where index <=n , n is size of linked list , so array is faster the linked list when have random access of elements.
Q.So what's the beauty behind this ?
As Arrays are contiguous memory blocks, large chunks of them will be loaded into the cache upon first access this makes it comparatively quick to access remaining elements of the array,as much as we access the elements in array locality of reference also increases thus less catch misses, Cache locality refers to the operations being in the cache and thus execute much faster as compared to in memory,basically In array we maximize the chances of sequential element access being in the cache. While Linked lists aren't necessarily in contiguous blocks of memory, there's no guarantee that items which appear sequentially in the list are actually arranged near each-other in memory, this means fewer cache hits e.g. more cache misses because we need to read from memory for every access of linked list element which increases the time it takes to access them and degraded performance so if we are doing more random access operation aka searching , array will be fast as explained below.
Insertion
This is easy and fast in LinkedList as insertion is O(1) operation in LinkedList (in Java) as compared to array, consider the case when array is full, we need to copy contents to new array if array gets full which makes inserting an element into ArrayList of O(n) in worst case, while ArrayList also needs to update its index if you insert something anywhere except at the end of array , in case of linked list we needn't to be resize it, you just need to update pointers.
Deletion
It works like insertions and better in LinkedList than array.
Those are the most common used implementations of Collection.
ArrayList:
insert/delete at the end generally O(1) worst case O(n)
insert/delete in the middle O(n)
retrieve any position O(1)
LinkedList:
insert/delete in any position O(1) (note if you have a reference to the element)
retrieve in the middle O(n)
retrieve first or last element O(1)
Vector: don't use it. It is an old implementation similar to ArrayList but with all methods synchronized. It is not the correct approach for a shared list in a multithreading environment.
HashMap
insert/delete/retrieve by key in O(1)
TreeSet
insert/delete/contains in O(log N)
HashSet
insert/remove/contains/size in O(1)
In reality memory locality has a huge performance influence in real processing.
The increased use of disk streaming in "big data" processing vs random access shows how structuring your application around this can dramatically improve performance on a larger scale.
If there is any way to access an array sequentially that is by far the best performing. Designing with this as a goal should be at least considered if performance is important.
I think that main difference is whether you frequently need to insert or remove stuff from the top of the list.
With an array, if you remove something from the top of list than the complexity is o(n) because all of the indices of the array elements will have to shift.
With a linked list, it is o(1) because you need only create the node, reassign the head and assign the reference to next as the previous head.
When frequently inserting or removing at the end of the list, arrays are preferable because the complexity will be o(1), no reindexing is required, but for a linked list it will be o(n) because you need to go from the head to the last node.
I think that searching in both linked list and arrays will be o(log n) because you will be probably be using a binary search.
Hmm, Arraylist can be used in cases like follows I guess:
you are not sure how many elements will be present
but you need to access all the elements randomly through indexing
For eg, you need to import and access all elements in a contact list (the size of which is unknown to you)
Use linked list for Radix Sort over arrays and for polynomial operations.
1) As explained above the insert and remove operations give good performance (O(1)) in LinkedList compared to ArrayList(O(n)). Hence if there is a requirement of frequent addition and deletion in application then LinkedList is a best choice.
2) Search (get method) operations are fast in Arraylist (O(1)) but not in LinkedList (O(n)) so If there are less add and remove operations and more search operations requirement, ArrayList would be your best bet.
I did some benchmarking, and found that the list class is actually faster than LinkedList for random inserting:
using System;
using System.Collections.Generic;
using System.Diagnostics;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
int count = 20000;
Random rand = new Random(12345);
Stopwatch watch = Stopwatch.StartNew();
LinkedList<int> ll = new LinkedList<int>();
ll.AddLast(0);
for (int i = 1; i < count; i++)
{
ll.AddBefore(ll.Find(rand.Next(i)),i);
}
Console.WriteLine("LinkedList/Random Add: {0}ms", watch.ElapsedMilliseconds);
watch = Stopwatch.StartNew();
List<int> list = new List<int>();
list.Add(0);
for (int i = 1; i < count; i++)
{
list.Insert(list.IndexOf(rand.Next(i)), i);
}
Console.WriteLine("List/Random Add: {0}ms", watch.ElapsedMilliseconds);
Console.ReadLine();
}
}
}
It takes 900 ms for the linked list and 100ms for the list class.
It creates lists of subsequent integer numbers. Each new integer is inserted after a random number which is already in the list.
Maybe the List class uses something better than just an array.
Arrays, by far, are the most widely used data structures. However, linked lists prove useful in their own unique way where arrays are clumsy - or expensive, to say the least.
Linked lists are useful to implement stacks and queues in situations where their size is subject to vary. Each node in the linked list can be pushed or popped without disturbing the majority of the nodes. Same goes for insertion/deletion of nodes somewhere in the middle. In arrays, however, all the elements have to be shifted, which is an expensive job in terms of execution time.
Binary trees and binary search trees, hash tables, and tries are some of the data structures wherein - at least in C - you need linked lists as a fundamental ingredient for building them up.
However, linked lists should be avoided in situations where it is expected to be able to call any arbitrary element by its index.
A simple answer to the question can be given using these points:
Arrays are to be used when a collection of similar type data elements is required. Whereas, linked list is a collection of mixed type data linked elements known as nodes.
In array, one can visit any element in O(1) time. Whereas, in linked list we would need to traverse entire linked list from head to the required node taking O(n) time.
For arrays, a specific size needs to be declared initially. But linked lists are dynamic in size.

Resources