How do I swap elements in a List? - c

I am trying to learn about data structures and algorithms on my own. I wrote my own double-linked list in C and now I want to write some algorithms to perform on the list. What is the preferred way to swap list items? Is it better to swap the content or to rearrange the pointers which point to the next and previous list item?

Rearrange the pointers. Swapping the data items can have side effects. In particular, you may have stored a reference to a node somewhere outside of the function, and usually when you rearrange the order of nodes in a list, you don't want people holding a reference to a node to suddenly find that the node points to new data. That's because generally the important, identifying characteristic of a node is the data it points to not its position in the list.

The canonical swap is done via pointer rearranging, has no side effects and is of course faster:
void swap (node *a, node *b) {
node *tmp;
tmp = a;
a = b;
b = tmp;
}

Depending on the kind of content being stored in the elements of the linked list, swapping the actual content of the element would be tricky (think about a linked list of different length strings for instance) so therefore it's easier to swap the pointers which point to the next and previous list items.

Depends on how you've allocated the content.
If you're storing the pointer to the content, then switching the content isn't a big deal. If you have a big structure that's part of your node, then switching the pointers could be more efficient than copying the whole content.

I tend to side with what most people have already said. A bit of background will probably help: swapping pointers is guaranteed to work whereas swapping objects may not always be as simple as it looks. Think of temporaries that may/will be created and exceptions (and I mean in general and not a C++ language feature way) can occur and actually leave your container (list) in an undesirable state. Look for invariants in your container -- which is that a swap should leave the list size intact as well as the elements intact and design thereon.

Is it better to swap the content or to rearrange the pointers which point to the next and previous list item?
Rearrange the pointers, because it might be much more expensive to swap all the elements of the nodes.
Swapping two pointers is constant operation in terms of Time Complexity.
However, if you'd like to swap all the elements between two nodes, then you would need access them all, and swap them with their counterparts, which is O(f), where f is the number of fields the struct representing the node has (i.e. the number of data a node has).
In other words, swapping node data is a linear operation, while swapping two pointers is constant.
Swapping the data may have side effects, especially when you may have stored a reference to a node somewhere outside of the function, which is already covered in the accepted answer, but I wanted to point out that fact too.

Related

Hashtable separate chaining in C

I’m building a hashtable in C using open hashing (separate chaining) to store words. I’m unconcerned by the order in which I will store words with the same hash key.
Currently, I have a pointer to a struct (struct dict * d) with my hashtable (struct item * arr). More specifically, this table is an array of items (struct item) containing a word (char * word) and a pointer (struct item * next).
I’m unclear about two aspects:
1. When chaining words together after collision (inserting new item), should I insert the
element at the beginning or at the end of the linked list?
I’ve seen it done both ways, but the latter seems more popular. However, the former seems quicker to me as I only need to set the pointer of my first item to my new item, and its pointer to null. I don’t have to do any pointer chasing (i.e. travel through my linked list until I find the null pointer).
2. Should my hashtable be an array of pointers to items (struct item), or
simply an array of items (struct item), as I have done?
In other words, should the very first item for a specific hash key be inserted in the first cell (an empty cell), or should there already be a pointer in that cell which we will point to this new item?
For 1. it really shouldn’t matter if you prepend or append to the list. If you keep the load small, the chains are short, and you shouldn’t see any noticeable difference in access performance. If you keep the table small and the load gets high, you might want to look into different strategies. The access pattern might matter then. For example, if you are more likely to look up recently inserted values, you want them early in the list, so then it is better to prepend. But with a hash table, it is better to keep the load small if you can, and then it shouldn’t matter.
For 2. either will also work. If your table is an array of pointers, NULL for empty chains, a simple recursive linked list implementation will work nicely. Make your list functions take a list as an argument and make insert and delete return a new list. Either argument or return value can be NULL. Then do something like tbl[bin] = insert(tbl[bin], val) or tbl[bin] = delete(tbl[bin], val). If the chains are short, you don’t have to worry too much about recursion overhead. In any case, you don’t need recursion for looking up a value or inserting if it is just prepending, so it is only delete where you don’t get tail recursion anyway. The benefit you get from having an array of links is either that you get a dummy element at the front of the list, which simplifies non-recursive list implementations by avoiding special cases for empty lists, or you avoid following a pointer to access the first element in the chain once you look up the bin. For the latter, you need a way to distinguish an empty chain from a chain with one element, though. It is hardly worth it, and if you want to avoid jumping along linked lists, open addressing or some other collision strategy might be better.

Array VS single linked list VS double link list

I am learning about arrays, single linked list and double linked list now a days and this question came that
" What is the best option between these three data structures when it comes to fast searching, less memory, easily insertion and updating of things "
As far I know array cannot be the answer because it has fixed size. If we want to insert a new thing. it wouldn't always be possible. Double linked list can do the task but there will be two pointers needed for each node so there will be memory problem, so I think single linked list will fulfill all given requirements. Am I right? Please correct me if I am missing any point. There is also one more question that instead of choosing one of them, can I make combination of one or more data structures given here to meet all the requirements?
"What is the best option between these three data structures when it comes to fast searching, less memory, easily insertion and updating of things".
As far as I can tell Arrays serve the purpose.
Fast search: You could do binary search if array is sorted. You dont get that option in linkedlist
Less memory: Arrays will take least memory (but contiguous memory )
Insertion: Inserting in array is a matter of a[i] = "value". If array size is exceeded then simply export data into a new array. That is exactly how HashMaps / ArrayLists work under covers.
Updating things: Only Arrays provide you with Random access. a[i] ="new value".. updated in O(1) time if you know the index.
Each of those has its own benefits and downsides.
For search speed, I'd say arrays are better suitable due to the quick lookup times.
Since an array is a sequence of same-size elements, retrieving the value at an index is just memoryLocation + index * elementSize. For a linked list, the whole list needs traversing.
Arrays also win in the "less memory" category, since there's no need to store extra pointers.
For insertions, arrays are slow. You'll need to traverse the array, copy contents to a new array, assign the new array, delete the old one...
Insertions go much quicker in linked- or double lists, because it's just a matter of changing one or two pointers.
In the end, it all just depends on the use case. Are you inserting a lot? Then you probably want to consider a non-array structure.
Do you need many quick lookups? Consider those arrays again. Etc..
See also this question.
A linked list is usually the best choice when we don’t know in advance the number of elements we will have to store or the number can change dynamically.
Arrays have slow insertion and deletion times. To insert an element to the front or middle of the array, the first step is to ensure that there is space in the array for the new element, otherwise, the array needs to be RESIZED. This is an expensive operation. The next step is to open space for the new element by shifting every element after the desired index. Likewise, for deletion, shifting is required after removing an element. This implies that insertion time for arrays is Big O of n (O(n)) as n elements must be shifted.
Using static arrays, we can save some extra memory in
comparison to linked lists because we do not need to store pointers to the next node
a doubly-linked list support fast insertion/removal at their ends. This is used in LRU cache, where you need to enter new item to front and remove the oldest item from the end.

Implementing arrays using linked lists (and vice versa)

After learning about arrays and linked lists in class, I'm curious about whether arrays can be used to create linked lists and vice versa.
To create a linked list using an array, could I store the first value of the linked list at index 0 of the array, the pointer to the next node at index 1 of the array, and so on? I guess I'm confused because the "pointer to next" seems redundant, given that we know that the index storing the value of the next node will always be: index of value of current node + 2.
I don't think it's possible to create an array using a linked list, because an array involves continuous memory, but a linked list can have nodes stored in different parts of computer memory. Is there some way to get around this?
Thanks in advance.
The array based linked list is generally defined in a 2-dimentional array, something like :
Benefit: The list will only take up to a specific amount of memory that is initially defined.
Down side: The list can only contain a specific predefined amount of items.
As a single linked list the data structure has to hold a head pointer. This data structure holds a head pointer however in this specific implementation it is a int. The head of the list is the pointer that holds the index to the first node. The first node holds the index to the next node and so on. The last node in the list will hold a next value of -1. This will indicate the end of the list. The fact that indices are taken as elements are added into the structure makes a requirement for a free list head. This free list is incorporated into the same 2-dementional array. Just as the head is an int the free list pointer is an int.
Now the data structure is composted of 3 major elements. The head pointer, the free head pointer and the 2-dimentional array. The list has to be initialized correctly to allow the list to be used. The list should be initialized as below.
Reference is this link
You could store a linked list in an array, but only in the sense that you have an ordered list. As you say, you do not need pointers as you know the order (it's explicit in the array ordering). The main differences for choosing between an array or linked list are:
Arrays are "static" in that items are fixed in their elements. You can't mremove an element and have the array automatically shuffle the following elements down. Of course you can bypass "empty" elements in your iteration it this requires specific logic. With a linked list, if you remove an element, it's gone. With an array, you have to shuffle all subsequent elements down.
As such, linked lists are often used where insertion/ deletion of elements is the most common activity. Arrays are most often used with access is required (as faster as directly accessed [by index]).
Another area where you may see benefits of linked lists over arrays is in sorting (where sorting is required or frequent). The reason for this being that linked list sorts require only pointer manipulation whereas aray sorting requires swapping and shuffling. That said, many sorting algorithms create new arrays anyway (merge-sort is typical) which reduces this overhead (though requires the same memory again for the sorted array).
You can mix your metaphors somewhat if, for example, you enable your linked list to be marked "read-only". That is, you could create an array of pointers to each node in your linked list. that way, you can have indexed access to your linked-list. The array becomes outdated (in the way described above) once elements are added or removed from your linked list (hence the read-only aspect).
So, to answer your specific questions:
1) There's no value in doing this - as per the details above
2) You haven't really provided enough information to answer the question of contiguous memory allocation. It depends on a lot: OS, architecture, compiler implementation. You haven't even mentioned the programming language. In short though, choosing between a linked list and array has little to do with contiguous memory allocation and more to do with usage. For instance, the java LinkedList class and ArrayList class both represent a List implementation but are specialised based on usage patterns. It is expected that LinkedList performs better for "lists" expecting high-modification (although in tests done a few years ago this proved to be negligible - I'm not sure of the state in the latest versions of java).
Also, you wouldn't typically "create an array with a linked list" or vice versa. They're both abstract data structures used in building larger components. They represent a list of something in a wider context (e.g. a department has a list of employees). Each datatype just has usage benefits. Similarly, you might use a set, queue, stack, etc. It depends on your usage needs.
I really hope I haven't confused you further!

binary seach tree index implementation using symbol tables

I am reading about index implementation using symbol tables in book by author Robert Sedwick in Algorithms in C++.
Below is snippet from the book
We can adapt binary search trees to build indices in precisely the
same manner as we provided indirection for sorting and for heaps.
Arrange for keys to be extracted from items via the key member
function, as usual. Moreover, we can use parallel arrays for the
links, as we did for linked lists. We use three arrays, one each for
the items, left links, and right links. The links are array indices
(integers), and we replace link references such as
x = x->l
in all our code with array references such as
x = l[x].
This approach avoids the cost of dynamic memory allocation for each
node—the items occupy an array without regard to the search function,
and we preallocate two integers per item to hold the tree links,
recognizing that we will need at least this amount of space when all
the items are in the search structure. The space for the links is not
always in use, but it is there for use by the search routine without
any time overhead for allocation. Another important feature of this
approach is that it allows extra arrays (extra information associated
with each node) to be added without the tree-manipulation code being
changed at all. When the search routine returns the index for an item,
it gives a way to access immediately all the information associated
with that item, by using the index to access an appropriate array.
This way of implementing BSTs to aid in searching large arrays of
items is sometimes useful, because it avoids the extra expense of
copying items into the internal representation of the ADT, and the
overhead of allocation and construction by new. The use of arrays is
not appropriate when space is at a premium and the symbol table grows
and shrinks markedly, particularly if it is difficult to estimate the
maximum size of the symbol table in advance. If no accurate size
prediction is possible, unused links might waste space in the item
array.
My questions on above text are
What does author mean by "we can use parallel arrays for the links as we did for linked lists" ? What does this statment mean and what are parallel arrays.
What does author mean links are array indices and we replace link references such x= x->l with x=l[x]?
What does author mean by "Another important feature of this approach is that it allows extra arrays (extra information associated with each node) to be added without the tree-manipulation code being changed at all." ?
You appear to have edited the text to take out the useful references. Either that or you have an earlier version of the text.
My third edition states that the index builds are covered in section 9.6, where it covers the process, and the parallel arrays are explained in chapter 3. The parallel arrays are simply storing the payload (the keys and possibly data that are held in the tree) and left/right pointers in three or more separate arrays, using the index to tie them together (x = left[x]). In that case, you may end up with something like:
int leftptr[100];
int rightptr[100];
char *payload[100];
and so on. In that example, node # 74 would have its data stored in payload[74], and the left and right "pointers" (actually indexes) stored in left[74] and right[74] respectively.
This is in contrast to having a single array of structures with the structure holding payload and pointers together (x = x->left;):
struct sNode {
struct sNode *left, right;
char payload[];
};
So, for your specific questions:
Parallel arrays are simply separating the tree structure information from the payload information and using the index to tie together information from those arrays.
Since you're using arrays for the links (and these arrays now hold array indexes rather than pointers), you no longer use x = x->left to move to the left child. Instead you use x = left[x].
The tree manipulation is only interested in the links. By having the links separated from the payload (and other possibly useful information), the code for manipulating tree structure can be simpler.
If you haven't already, you should flip back in the book to the section on linked-lists where he says the technique was used previously (it's probably explained there).
Parallel arrays means we don't have a struct to hold the node information.
struct node {
int data;
struct node *left;
struct node *right;
};
Instead, we have arrays.
int data[SIZE];
int left[SIZE];
int right[SIZE];
These are parallel arrays because we will use the same index to access the data and links. The node is represented in our code by an index, not a pointer. So for node 4, the data is at
data[4];
The left link is at
left[4];
Adding more information at the node can be done by creating yet another array of the same size.
int extra[SIZE];
The extra data for node 4 will be at
extra[4];

Why are linked lists faster than arrays?

I am very puzzled about this. Everywhere there is written "linked lists are faster than arrays" but no one makes the effort to say WHY. Using plain logic I can't understand how a linked list can be faster. In an array all cells are next to each other so as long as you know the size of each cell it's easy to reach one cell instantly. For example if there is a list of 10 integers and I want to get the value in the fourth cell then I just go directly to the start of the array+24 bytes and read 8 bytes from there.
In the other hand when you have a linked list and you want to get the element in the fourth place then you have to start from the beginning or end of the list(depending on if it's a single or double list) and go from one node to the other until you find what you're looking for.
So how the heck can going step by step be faster than going directly to an element?
This question title is misleading.
It asserts that linked lists are faster than arrays without limiting the scope well. There are a number of times when arrays can be significantly faster and there are a number of times when a linked list can be significantly faster: the particular case of linked lists "being faster" does not appear to be supported.
There are two things to consider:
The theoretical bounds of linked-lists vs. arrays in a particular operation; and
the real-world implementation and usage pattern including cache-locality and allocations.
As far as the access of an indexed element: The operation is O(1) in an array and as pointed out, is very fast (just an offset). The operation is O(k) in a linked list (where k is the index and may always be << n, depending) but if the linked list is already being traversed then this is O(1) per step which is "the same" as an array. If an array traversal (for(i=0;i<len;i++) is faster (or slower) depends upon particular implementation/language/run-time.
However, if there is a specific case where the array is not faster for either of the above operations (seek or traversal), it would be interesting to see to be dissected in more detail. (I am sure it is possible to find a language with a very degenerate implementation of arrays over lists cough Haskell cough)
Happy coding.
My simple usage summary: Arrays are good for indexed access and operations which involve swapping elements. The non-amortized re-size operation and extra slack (if required), however, may be rather costly. Linked lists amortize the re-sizing (and trade slack for a "pointer" per-cell) and can often excel at operations like "chopping out or inserting a bunch of elements". In the end they are different data-structures and should be treated as such.
Like most problems in programming, context is everything. You need to think about the expected access patterns of your data, and then design your storage system appropriately. If you insert something once, and then access it 1,000,000 times, then who cares what the insert cost is? On the other hand, if you insert/delete as often as you read, then those costs drive the decision.
Depends on which operation you are referring to. Adding or removing elements is a lot faster in a linked list than in an array.
Iterating sequentially over the list one by one is more or less the same speed in a linked list and an array.
Getting one specific element in the middle is a lot faster in an array.
And the array might waste space, because very often when expanding the array, more elements are allocated than needed at that point in time (think ArrayList in Java).
So you need to choose your data structure depending on what you want to do:
many insertions and iterating sequentially --> use a LinkedList
random access and ideally a predefined size --> use an array
Because no memory is moved when insertion is made in the middle of the array.
For the case you presented, its true - arrays are faster, you need arithmetic only to go from one element to another. Linked list require indirection and fragments memory.
The key is to know what structure to use and when.
Linked lists are preferable over arrays when:
a) you need constant-time insertions/deletions from the list (such as in real-time computing where time predictability is absolutely critical)
b) you don't know how many items will be in the list. With arrays, you may need to re-declare and copy memory if the array grows too big
c) you don't need random access to any elements
d) you want to be able to insert items in the middle of the list (such as a priority queue)
Arrays are preferable when:
a) you need indexed/random access to elements
b) you know the number of elements in the array ahead of time so that you can allocate the correct amount of memory for the array
c) you need speed when iterating through all the elements in sequence. You can use pointer math on the array to access each element, whereas you need to lookup the node based on the pointer for each element in linked list, which may result in page faults which may result in performance hits.
d) memory is a concern. Filled arrays take up less memory than linked lists. Each element in the array is just the data. Each linked list node requires the data as well as one (or more) pointers to the other elements in the linked list.
Array Lists (like those in .Net) give you the benefits of arrays, but dynamically allocate resources for you so that you don't need to worry too much about list size and you can delete items at any index without any effort or re-shuffling elements around. Performance-wise, arraylists are slower than raw arrays.
Reference:
Lamar answer
https://stackoverflow.com/a/393578/6249148
LinkedList is Node-based meaning that data is randomly placed in memory and is linked together by nodes (objects that point to another, rather than being next to one another)
Array is a set of similar data objects stored in sequential memory locations
The advantage of a linked list is that data doesn’t have to be sequential in memory. When you add/remove an element, you are simply changing the pointer of a node to point to a different node, not actually moving elements around. If you don’t have to add elements towards the end of the list, then accessing data is faster, due to iterating over less elements. However there are variations to the LinkedList such as a DoublyLinkedList which point to previous and next nodes.
The advantage of an array is that yes you can access any element O(1) time if you know the index, but if you don’t know the index, then you will have to iterate over the data.
The down side of an array is the fact that its data is stored sequentially in memory. If you want to insert an element at index 1, then you have to move every single element to the right. Also, the array has to keep resizing itself as it grows, basically copying itself in order to make a new array with a larger capacity. If you want to remove an element in the begging, then you will have to move all the elements to left.
Arrays are good when you know the index, but are costly as they grow.
The reason why people talk highly about linked lists is because the most useful and efficient data structures are node based.

Resources