Welcome mon amie,
In some homework of mine, I feel the need to use the Graph ADT. However, I'd like to have it, how do I say, generic. That is to say, I want to store in it whatever I fancy.
The issue I'm facing, has to do with complexity. What data structure should I use to represent the set of nodes? I forgot to say that I already decided to use the Adjacency list technic.
Generally, textbooks mention a linked list, but, it is to my understanding that whenever a linked list is useful and we need to perform searches, a tree is better.
But then again, what we need is to associate a node with its list of adjacent nodes, so what about an hash table?
Can you help me decide in which data structure (linked list, tree, hash table) should i store the nodes?
...the Graph ADT. However, I'd like to have it, how do I say, generic. That is to say, I want to store in it whatever I fancy.
That's basically the point of an ADT (Abstract Data Type).
Regarding which data structure to use, you can use either.
For the set of nodes, a Hash table would be a good option (if you have a good C implementation for it). You will have amortized O(1) access to any node.
A LinkedList will take worst case O(n) time to find a node, a Balanced Tree will be O(logn) and won't give you any advantage over the hash table unless for some reason you will sort the set of nodes a LOT (in which case using an inorder traversal of a tree is sorted and in O(n) time)
Regarding adjacency lists for each node, it depends what you want to do with the graph.
If you will implement only, say, DFS and BFS, you need to iterate through all neighbors of a specific node, so LinkedList is the simplest way and it is sufficient.
But, if you need to check if a specific edge exists, it will take you worst case O(n) time because you need to iterate through the whole list (An Adjacency Matrix implementation would make this op O(1))
A LinkedList of adjacent nodes can be sufficient, it depends on what you are going to do.
If you need to know what nodes are adjacent to one another, you could use an adjacency matrix. In other words, for a graph of n nodes, you have an n x n matrix whose entry for (i,j) is 1 if i and j are next to each other in the graph.
Related
I know that an adjacency list is a common data structure for representing a graph, using an array of linked lists. I am working on implementing an inverted index for a simple search engine in C, and was going to use an adjacency list. However, I have found that one disadvantage of using an adjacency list is that if you don't know how many words are going to be in the inverted index, you have to assume that there is an arbitrarily large amount of words in the index (array elements) in order to create the adjacency list. This could result in excess memory being used. It is not a huge issue but I was wondering if there is a better way to implement it.
I was thinking that one solution to this problem would be to create a linked list of linked lists to represent my inverted index instead. I haven't seen many examples of a linked list of linked lists graph representation so I am assuming that is not commonly used or a conventional representation. I would like to know whether it is appropriate to use a linked list of linked lists to represent a graph in general? Or is it better to stick to using an adjacency list? Any insights would be really appreciated.
There is a trade-off to both approaches.
You can use an adjacency list without using extra memory, however, every insertion and deletion will cost O(|V|) time.
If you, for instance, double the length of your array when you add a vertex for the first time, it will only cost you O(|V|) time every time you double your vertice count. However, you will almost always be holding extra memory using this approach.
If you choose to represent a graph with a LinkedList of LinkedLists you indeed optimize memory, but at a large performance trade-off. Finding the neighbours of a given node goes from O(|E|) time, to O(|V||E|) time, which eliminates one of the biggest advantages of an adjacency list.
And if you would like to do a more advanced operation- like traversing the graph- the performance cost will be extremely inefficient. For every neighbour of a vertex you would have to re-traverse the node LinkedList in order to find the neighbouring vertex.
I would like to implement data structure which is able to make fast insertion and keeping data sorted, without duplicates, after every insert.
I thought about binomial heap, but what I understood about that structure is that it can't tell during insertion that particular element is yet in heap. On the another hand there is AVL tree, which fits perfectly for my case, but honestly there are rather too hard for implement for me, at that moment.
So my question is: is there any possiblity to edit binomial heap insertion algorithm to skip duplicates? Maybe anyoune could suggest another structure?
Grettings :)
In C++, there is std::set. it is internally an implementation of red black tree. So it will sort when you enter data.You can have a look into that for a reference.
A good data structure for this is the red-black tree, which is O(log(n)) for insertion. You said you would like to implement a data structure that does this. A good explanation of how to implement that is given here, as well as an open source usable library.
If you're okay using a library you may take a look at libavl Here
The library implements some other varieties of binary trees as well.
Skip lists are also a possibility if you are concerned with thread safety. Balanced binary search trees will perform more poorly than a skip list in such a case as skip lists require no rebalancing, and skip lists are also inherently sorted like a BST. There is a disadvantage in the amount of memory required (since multiple linked lists are technically used), but theoretically speaking it's a good fit.
You can read more about skip lists in this tutorial.
If you have a truly large number of elements, you might also consider just using a doubly-linked list and sorting the list after all items are inserted. This has the benefit of ease of implementation and insertion time.
You would then need to implement a sorting algorithm. A selection sort or insertion sort would be slower but easier to implement than a mergesort, heapsort, or quicksort algorithm. On the other hand, the latter three are not terribly difficult to implement either. The only thing to be careful about is that you don't overflow the stack since those algorithms are typically implemented using recursion. You could create your own stack implementation (not difficult) and implement them iteratively, pushing and popping values onto your stack as necessary. See Iterative quicksort for an example of what I'm referring to.
if you looking for fast insertion and easy implemantaion why not linked list (single or double).
insertion : push head/ push tail - O(1)
remove: pop head/pop tail - O(1)
the only BUT is "find" will be in O(n)
It's my understanding that doubly linked lists use more memory but less CPU and usually provide better algorithm complexity compared to simple linked lists.
What I would like to know is when a simple linked list offers better overall results compared to a doubly linked list. Is there a clear point/situation after which using one instead of the other is unarguably the best solution? (Like after x elements on a regular PC)
My specific problem:
I'm implementing a linked list structure for general use and am thinking whether or not a link back should be included as it would greatly decrease complexity for element removal.
Thanks.
UPDATE:
At what size does element removal become too expensive on a simple linked list?
Choosing a data structure is about weighing costs vs benefits, usually work vs upkeep.
A singly linked list offers easy traversal and easy front insertion. Easy tail insertion can be had at the cost of tracking the last node. That cost means whenever you add/remove a node you have to do an additional check (is this the tail?) and a possible update of the list structure.
A doubly linked list adds a lot more maintenance overhead. Every node now has to store two pointers and those have to be managed and maintained.
If you never need to walk backwards in your list, then a singly linked list is ideal, but not being able to walk backwards means that removal is more expensive.
So you need to start by determining which usage pattern you're going to have. If you are building a one-use list, then a singly linked list may be ideal. If you are building a dynamic list with a high rate of removals, then a doubly linked list will be better suited.
Determining specific costs of operations your data structure offers is the topic of things like 'Big O Notation'.
What I would like to know is when a simple linked list offers better
overall results compared to a doubly linked list.
When you don't have to go backwards.
If you're doing a one way linear search, there's no incentive to traverse the list the other direction, and you wouldn't need those pointers.
UPDATE:
At what size does element removal become too expensive on a simple
linked list?
This problem doesn't have anything to with whether a list is singly or doubly linked. if you have to delete something in a list, you need to look for it, which is time complexity O(n). Having an extra pointer to the previous node doesn't help here.
Linked List is a data structure which can be used to implement a stack or a queue, most of the times. Generally speaking if you are implementing a stack insertion and deletion happen at a single end. if a Q is used usually we delete at one end and add at other end. As you can see both these abstract ds don't need double list and operations add and delete don't depend on number of items.
As mentioned above by Kepani, the only case where you would be worried about the number of elements in list is when you delete in a fashion not described by a stack / Q interface ( a non-linear approach). which would be when the elements are ordered (it can be otherwise of course) and order needs to be maintained.
Using a double list is definitely hard on memory requirement as each node needs to maintain an "extra" pointer, unless you are trying to do maintain a store where you refer to past values as well. Dlist will be handy here (Eg: a cli command line interpreter.)
A single list is hard on time as traversals required to reach the previous value of the current node will depend on the number of elements in the list.
Presuppose there is a linked list with information. My Program architecture rests upon being able to find out if certain information already exists in an individual node in this linked list.
(Ie, see if the child integer value int example already has a value of 5 in some node.)
After that, each node can be systematically operated on.
I can think of various ways to do this, but I was hoping someone more experienced might show something fairly efficient.
Also, is this good practice, or should there be another, more suitable data structure out there?
Thanks!
If O(N) is not enough, sorted array and binary search, or a BST would give you O(log(N)). Alternatively you can look at hashmap data structure. This structure will give you nearly constant time lookup based on a key, but is more complicated than the other options. Problem is that there is no std library implementation of one.
Otherwise searching each element is the best you can hope to do with a linked list.
I need to add an unknown number of times using pthreads to a data structure and order them earliest first, can anybody recommend a good structure (linked list/ array list) for this?
A linked list will be O(n) in finding the place where the new object is to go, but constant in inserting it.
A dynamic array/array list will be O(log(n)) finding the right place but worst case O(n) insertion, since you'll need to move all values past the insertion point one over.
If you don't need random access, or at least not until the end, you could use a heap, O(log(n)) insertion, after you're done you can pull them out in O(log(n)) each, so O(n*log(n)) for all of them.
And it's possible there's a (probably tree-based) structure that can do all of it in O(log(n)) (red-black tree?).
So, in the end it boils down to how, precisely, you want to use it.
Edit: Looked up red-black trees and it looks like they are O(log(n)) search ("amortized O(1)", according to Wikipedia), insertion, and deletion, so that may be what you want.
If you just need to order at the end, use a linked-list to store the pthreads maintaining a count of records added. Then create an array of size count copying the elements to the newly created array and deleting them from the list.
Finally sort the array using qsort.
If you need to maintain an ordered list of pthreads use heap
The former approach would have the following complexity
O(n) for Insert
O(nlog(n)) for Sorting
The Later Approach would have
O(nlog(n)) for Insert and Fetching
You can also see priority queue
Please note if you are open in using STL, you can go for STL priority_queue
In terms of memory the later would consume more memory because you have to store two pointers per node.