can someone explain to me the difference between Vector and Linked List ADT in a c programming language context.
Thanks.
Well, in C, there are no "vector" and "list" data types available to you directly like in C++ std library. But in terms of "abstract data type", a vector is usually considered to represent contiguous storage, and a linked list is considered to be represented by individual cells linked together. Vectors provide fast constant time random-access read and write operations, but inserting and deleting vector elements take linear time. Lists have linear lookup performance to find an element to read and write, but given an element location, have constant time insertion and deletion. You can also add items to the start and to the end of a list in constant time (if the ADT implementation caches the location of the last element in the list).
A vector is often implemented as a contiguous block of memory as an array. Whereas a list can be spread across memory as each element holds pointers to one or more other elements (could be doubly linked). This gives vectors the access speed advantage but lists the insertion/deletion advantage.
Basically, a vector resides in contiguous memory. A linked list contains pointers to the previous and next structures. Vector is faster for random access, the linked list is better for growing.
http://www.codeguru.com/forum/archive/index.php/t-309352.html
vector is a dynamic array. the elements inside is adjacent in the memory. The elements inside linked list is not adjacent.
Related
I was recently in an interview that required me to choose over the two data structures for a problem, and now I have the question of:
What is the reasoning for using a Stack over an array if the only operations needed are push and pop? An array provides constant time for appending and popping the last element from it and it takes up generally less memory than implementing a Stack with a LinkedList. It also provides random access should it be required. Is the only reasoning because an array is typically of fixed size, so we need to dynamically resize the array for each element we put in? This still is in constant time though isn't it unless the penalty is disproportionate?
There are several aspects to consider here...
First, a Stack is an abstract data type. It doesn't define how to implement itself.
An array is (generally) a well defined concrete implementation, and might even be fixed size unless explicitly defined to be dynamic.
A dynamic array can be implemented such that it automatically grows by some factor when exhausted and also might shrink when fill rate drops. These operations are not constant time, but are actually amortized to constant time because the array doesn't grow or shrink in each operation. In terms of memory usage it's hard to imagine an array being more expensive then a linked list unless extremely under used.
The main problem with an array is large allocation size. This is both a problem of maximum limitation and memory fragmentation. Using a linked list avoids both issues because every entry has a small memory footprint.
In some languages like C++, the underlying container that the 'stack' class uses can actually be changed between a dynamic array (vector), linked list (list), or even a double ended queue (deque). I only mention this because its typically not fair to compare a stack vs an array (one is an interface, another is a data structure).
Most dynamic array implementations will allocate more space than is needed, and upon filling the array they will again resize to 2x the size and so on. This avoids allocations and keeps the performance of push generally constant time. However the occasional resize does require copying elements O(n), though this is usually said to amortized to constant time. So in general, you are correct in that this is efficient.
Linked lists on the other hand typically require allocations for every push, which can be somewhat expensive, and the node's they create are larger in size than a single element in the array.
One possible advantage of linked lists, however, is that they do not require contiguous memory. If you have many many elements, its possible that you can fail to allocate a large enough block of memory for an array. Having said that, linked lists take up more memory... so its a bit of a wash.
In C++ for example, the stack by default uses the deque container. The deque is typically implemented as a dynamic array of 'pages' of memory. Each page of memory is fixed in size, which allows the container to actually have random access properties. Moreover, since each page is separate, then the entire container does not require contiguous memory meaning that it can store many many elements. Resizing is also cheap for a deque because it simply allocates another page, making it a great choice for a large stack.
I was reviewing an interview question and comparing notes with a friend, and we have different ideas on one with respect to CPU caching.
Consider a large set of data, such as a large array of double, ie:
double data[1024];
Consider using a dynamically allocated on-the-fly linked list to store the same number of elements. The question asked for a description of a number of trade-offs:
Which allows for quicker random access: we both agreed the array was quicker, since you didn't have to traverse the list in a linear fashion (O(n)), just provide an index (O(1)).
Which is quicker for comparing two lists of the same length: we both decided that if it was just primitive data types, the array would allow for a memcmp(), while the linked list required element-wise comparison plus dereferencing overhead.
Which allowed for more efficient caching if you were accessing the same element several times?
In point 3, this is where our opinions differed. I contended that that the CPU is going to try and cache the entire array, and if the array is obscenely large, it can't be stored in cache, and therefore there will be no caching benefit. With the linked list, individual elements can be cached. Therefore, linked lists lend themselves to cache "hits" more than static arrays do when dealing with a very large number of elements.
To the question: Which of the two is better for cache "hits", and can modern systems cache part of an array, or do they need the whole array or it won't try? Any sort of references to technical documents or standards I could also use to provide a definitive answer would help a lot.
Thanks!
The CPU doesn't know about your data structures. It caches more-or-less raw blocks of memory. Therefore, if you suppose you can access the same one element multiple times without traversing the list each time, then neither linked list nor array has a caching advantage over the other.
HOWEVER, arrays have a big advantage over dynamically-allocated linked lists for accessing multiple elements in sequence. Because CPU caches operate on blocks of memory rather larger than one double, when one array element is in the cache, it is likely that several others that reside at adjacent addresses are in the cache, too. Thus one (slow) read from main memory gives access to (fast) cached access to multiple adjacent array elements. The same is not true of linked lists, as nodes may be allocated anywhere in memory, and even a single node has at minimum the overhead of a next pointer to dilute the number of data elements that may be cached at the same time.
Caches don't know about arrays, they just see memory accesses and store a little bit of the memory near that address. Once you've accessed something at an address it should stick around in the cache a while, regardless of whether that address belongs to an array or a linked list. But the cache controllers don't really know what's being accessed.
When you traverse an array, the cache system may pre-fetch the next bit of an array. This is usually heuristically driven (maybe with some compiler hints).
Some hardware and toolchains offer intrinsics that let you control cache residency (through pre-fetches, explicit flushes and so forth). Normally you don't need this kind of control, but for things like DSP code, resource-constrained game consoles and OS-level stuff that needs to worry about cache coherency it's pretty common to see people use this functionality.
Contents
Question Statement
Elaboration and Examples
Resources Visited
Answers (As they are posted)
Follow-up Questions (As they are conceived)
Question
How is the size of a primitive type considered when calculating the big-O style notation for 'space(memory) complexity'?
Elaboration and Examples
If I allocate an array N elements in length, and each element is a 32-bit integer, I have presumably allocated approximately N*32 bits. It is my understanding that the memory complexity of this allocation is considered O(N).
Using the above example, if I treat each element in the array as a pointer to a unique linked list, wherein the linked list is of length 1 (contains 1 node and a null pointer) and the data segment of that node is also a 32-bit integer, I am clearly now allocating:
32-bit array element
32-bit linked list data
32-bit linked list null pointer
Has my array become O(3*32*N)? I understand that this would still be considered O(N), but as you can see knowing the difference is relevant in cases where a time/memory tradeoff becomes relevant (e.g. I can use linked lists of various lengths with head pointers stored in the elements of the array to delay the point at which I must dynamically resize the array, since I can merely lengthen the linked lists - this amortizes the insert operation to O(1), but increases the memory complexity substantially until the resizing actually happens, wherein the linked lists would revert to elements in the array, and thus consume substantially less memory)
Resources already visited:
related questions on Stack Overflow:
Effect of memory usage in the complexity of an algorithm
Why is the complexity of A* exponential in memory?
And wikibooks had the following to say:
http://en.wikibooks.org/wiki/Data_Structures/Asymptotic_Notation
Additionally, wikipedia expounded regarding this topic in substantial detail:
http://en.wikipedia.org/wiki/Big_O_notation
Answers
Follow-up Questions
The purpose of big-O notation is that we don't ever have something like O(32*N). If the difference is really important, then the accepted convention is to not use big-O notation or to say something like 32*O(N).
I think your confusion stems from a misunderstanding about big-O notation. If you have an array of 32-bit ints, it definitely takes up less space than an array of pointers to singleton linked lists holding 32-bit integers (probably by a factor of 3 or 4, depending on what kind of linked list you have). However, asympotically speaking, both of these setups require Θ(n) memory, since Θ notation talks about the asymptotic growth rate of the memory consumption and both approaches require memory linearly proportional to the number of elements being used.
Typically, asymptotic space complexity is used to rank different approaches against one another when the space usage is different. For example, one data structure with space complexity Θ(n log n) will in the long run always use more space than a data structure with space complexity Θ(n). However, two different data structures with space complexity Θ(n) might have wildly different memory footprints. As an example, consider difference in space complexity between a standard dynamic array and a Fibonacci heap. Both require Θ(n) memory, but the actual memory needed is hugely different (the Fibonacci heap probably needs 8 - 12x more memory).
Hope this helps!
Many times, stack is implemented as a linked list, Is array representation not good enough, in array we can perform push pop easily, and linked list over array complicates the code, and has no advantage over array implementation.
Can you give any example where a linked list implementation is more beneficial, or we cant do without it.
I would say that many practical implementations of stacks are written using arrays. For example, the .NET Stack implementation uses an array as a backing store.
Arrays are typically more efficient because you can keep the stack nodes all nearby in contiguous memory that can fit nicely in your fast cache lines on the processor.
I imagine you see textbook implementations of stacks that use linked lists because they're easier to write and don't force you to write a little bit of extra code to manage the backing array store as well as come up with a growth/copy/reserve space heuristic.
In addition, if you're really pressed to use little memory, a linked list implementation might make sense since you don't "waste" space that's not currently used. However, on modern processors with plenty of memory, it's typically better to use arrays to gain the cache advantages they offer rather than worry about page faults with the linked list approach.
Size of array is limited and predefined. When you dont know how many of them are there then linked list is a perfect option.
More Elaborated comparison:-(+ for dominating linked list and - for array)
Size and type constraint:-
(+) Further members of array are aligned at equal distance and need contiguous memory while on the other side link list can provide non contiguous memory solution, so sometimes it is good for memory as well in case of huge data(avoids cpu polling for resource).
(+) Suppose in a case you are using an array as stack, and the array is of type int.Now how will you accommodate a double in it??
Portability
(+) Array can cause exceptions like index out of bound exceptions but you can increase the chain anytime in a linked list.
Speed and performance
(-)If its about performance, then obviously most of the complexity fall around O(1) for arrays.In case of a linked list you will have to select a starting node to start the tracing and this adds to performance penalty.
When the size of the stack can vary greatly you waste space if you have generalized routines which always allocate a huge array.
Obviously a fixed size array has limitation of knowing maximum size before hand.
If you consider dynamic array then Linked List vs. Arrays covers the details including complexities for performing operations.
Stack is implemented using Linked List because Push and Pop operations are of O(1) time complexities, compared to O(n) for arrays. (apart from flexible size advantage in Linked List)
what is the difference between array and list?
In C, an array is a fixed-size region of contiguous storage containing multiple objects, one after the other. This array is an "object" in the meaning which C gives to the word - basically just some memory that represents something. An object could just be an int.
You can distinguish slightly between array objects, and array types. Often people use array objects which are allocated with malloc, and used via a pointer to the first element. But C does also have specific types for arrays of different sizes, and also for variable-length-arrays, whose size is set when they are created. VLAs have a slightly misleading name: the size is only "variable" in the sense that it isn't fixed at compile time. It can't change during the lifetime of the object.
So, when I say an array is fixed-size I mean that the size cannot change once the array is created, and this includes VLAs. There is realloc, which logically returns a pointer to a new array that replaces the old one, but can sometimes return the same address passed in, having changed the size of the array in place. realloc operates on memory allocations, not on arrays in general.
That's what an array is. The C programming language doesn't define anything called a list. Can't really compare something which is well defined, with something that isn't defined ;-) Usually "list" would mean a linked list, but in some contexts or in other languages it means other things.
For that matter, in other languages "array" could mean other things, although I can't immediately think of a language where it means anything very different from a C array.
If your question really has nothing to do with C, and is a language-agnostic data-structures question, "what is the difference between an array and a linked list?", then it's a duplicate of this:
Array versus linked-list
There is no such thing as a standard list in C. There is such a thing in C++, where it is implemented as a double-linked list.
The main differences are that arrays have random access - you can access any member of the array in O(1) time (i.e. if a was an array, a[4]) and have a pre-set size at compile time. Linked lists have sequential access - to access an element, you have to loop through the list until you get to the element you want (i.e. if b was a linked list, to get to the 5th element of b you would have to iterate through elements 0, 1, 2, 3 and 4), and the size can be grown and shrunk dynamically.
Although there is nothing like a list in C per se but you sure could be talking about a linked lists implementation.
Array: Random access, predefine size.
Linked List: Sequential access, size at runtime.
Other languages like, say Python, may have have both lists and arrays inbuilt and their meaning may differ.
Useful comments from below:
You could add array lists. Lists which internally is an array which is doubled when needed and halved when only 1/4 full. This gives O(1) for add, remove, get(index) amortized. – lasseespeholt
Python's list is not a linked list. And the distinction between Python list and array is list can store anything while array can only store primitive types (int, float, etc). – KennyTM
For array, it has a fixed size like we write, new int [100]
but list does not have a fixed size...it can go on and on
Insertion and Deletion is easier in list than in array
Reason: we can simply use to change the pointers to insert and delete for linked list but for array insert and deletion needs shiftRight and shiftLeft
Linked List uses a dummy head node to avoid special cases of inserting into an empty list, or removing the last node from a list of unit size; and, it uses double links to allow iterating in both directions. The cost of course is the extra space needed to hold the dummy node (minimal cost), and the extra previous link in addition the usual next link for each node (much more significant cost).
In array, we can add with the help of its random access
In Linked list, reference to the tail node is simply header.prev, which gives us ability to append to the list in constant time (without having to iterate to find the tail reference, or having to maintain a separate tail reference).
But in array, we need to re-size the array before inserting.
Array has the flexibility to attain random access unlike Linked List.
Linked list has problems like,
It consumes extra memory storage for the pointer we are using!
Time complexity of O(n) instead of O(1) like in array
Reverse traversing is difficult for singly linked list and if we use doubly linked list, another pointer means more of extra memory storage
Heap Restriction as well! Memory is allocated only if there is space available in the heap. If insufficient memory then memory won't be created.
Array has problems like,
a chance of memory wastage or shortage.
Hope this helps ! :)
An often under appreciated characteristic of Linked data structures is that you can use them in situations where memory is highly fragmented due to there being no contiguous memory guarantee between elements. For example you could have 100MB of free space but only say a maximum run of free memory of length 10MB. In this case you can only create an an array of size 10MB but perhaps a potentially larger linked list since you would be able to make use of every run of free memory which was large enough to contain a single node.
array has only similar data types(i.e.,) they are homogeneous in nature. we can only have an array of strings , integers etc. also the size of array is predefined.
but in the case of list we can have any type of elements. let it be a string integer or combination of both.Also null or duplicate elements are allowed in list. example of list include arraylist , linkedlist.here in list the size can grow or shrink at any time.