Graph Representation - Linked List of Linked Lists - c

I know that an adjacency list is a common data structure for representing a graph, using an array of linked lists. I am working on implementing an inverted index for a simple search engine in C, and was going to use an adjacency list. However, I have found that one disadvantage of using an adjacency list is that if you don't know how many words are going to be in the inverted index, you have to assume that there is an arbitrarily large amount of words in the index (array elements) in order to create the adjacency list. This could result in excess memory being used. It is not a huge issue but I was wondering if there is a better way to implement it.
I was thinking that one solution to this problem would be to create a linked list of linked lists to represent my inverted index instead. I haven't seen many examples of a linked list of linked lists graph representation so I am assuming that is not commonly used or a conventional representation. I would like to know whether it is appropriate to use a linked list of linked lists to represent a graph in general? Or is it better to stick to using an adjacency list? Any insights would be really appreciated.

There is a trade-off to both approaches.
You can use an adjacency list without using extra memory, however, every insertion and deletion will cost O(|V|) time.
If you, for instance, double the length of your array when you add a vertex for the first time, it will only cost you O(|V|) time every time you double your vertice count. However, you will almost always be holding extra memory using this approach.
If you choose to represent a graph with a LinkedList of LinkedLists you indeed optimize memory, but at a large performance trade-off. Finding the neighbours of a given node goes from O(|E|) time, to O(|V||E|) time, which eliminates one of the biggest advantages of an adjacency list.
And if you would like to do a more advanced operation- like traversing the graph- the performance cost will be extremely inefficient. For every neighbour of a vertex you would have to re-traverse the node LinkedList in order to find the neighbouring vertex.

Related

Time Efficiency of mergesort on linked list vs array of pointers

I am trying to figure out the time efficiency of mergesort on a linked list versus an array of pointers (Not worrying about how I am going to use it in the future, solely the speed at which the data get sorted).
Which would be faster. I imagine using an array of pointers requires an additional layer of memory access.
But at the same time, accessing a linked list would be slower Assuming we go in already knowing the linked list length, mergesort would still require iterating through the linked list jumping from memory to memory til you get a pointer to the middle node of the linked list, which I think thinks more time than an array.
Does anyone have any insights? Is it more contextual to the data being sorted?
The primary difference between implementing merge sort of a linked list versus an array of pointers is that with the array you end up having to use a secondary array. The algorithmic complexity is the same, O(n * log(n)) is the same, but the array version uses O(n) extra memory. You don't need to use that extra memory in the linked list case.
In real world implementation, runtime performance of the two should differ by a constant factor, but not enough to favor one over the other. That is, if you have an array of pointers, you probably won't benefit from turning it into a linked list, sorting, and converting it back to an array. Nor would you, given a linked list, benefit from creating an array, sorting it, and then building a new array.

How to determine the point at which a doubly linked list becomes a better solution than a simple linked list?

It's my understanding that doubly linked lists use more memory but less CPU and usually provide better algorithm complexity compared to simple linked lists.
What I would like to know is when a simple linked list offers better overall results compared to a doubly linked list. Is there a clear point/situation after which using one instead of the other is unarguably the best solution? (Like after x elements on a regular PC)
My specific problem:
I'm implementing a linked list structure for general use and am thinking whether or not a link back should be included as it would greatly decrease complexity for element removal.
Thanks.
UPDATE:
At what size does element removal become too expensive on a simple linked list?
Choosing a data structure is about weighing costs vs benefits, usually work vs upkeep.
A singly linked list offers easy traversal and easy front insertion. Easy tail insertion can be had at the cost of tracking the last node. That cost means whenever you add/remove a node you have to do an additional check (is this the tail?) and a possible update of the list structure.
A doubly linked list adds a lot more maintenance overhead. Every node now has to store two pointers and those have to be managed and maintained.
If you never need to walk backwards in your list, then a singly linked list is ideal, but not being able to walk backwards means that removal is more expensive.
So you need to start by determining which usage pattern you're going to have. If you are building a one-use list, then a singly linked list may be ideal. If you are building a dynamic list with a high rate of removals, then a doubly linked list will be better suited.
Determining specific costs of operations your data structure offers is the topic of things like 'Big O Notation'.
What I would like to know is when a simple linked list offers better
overall results compared to a doubly linked list.
When you don't have to go backwards.
If you're doing a one way linear search, there's no incentive to traverse the list the other direction, and you wouldn't need those pointers.
UPDATE:
At what size does element removal become too expensive on a simple
linked list?
This problem doesn't have anything to with whether a list is singly or doubly linked. if you have to delete something in a list, you need to look for it, which is time complexity O(n). Having an extra pointer to the previous node doesn't help here.
Linked List is a data structure which can be used to implement a stack or a queue, most of the times. Generally speaking if you are implementing a stack insertion and deletion happen at a single end. if a Q is used usually we delete at one end and add at other end. As you can see both these abstract ds don't need double list and operations add and delete don't depend on number of items.
As mentioned above by Kepani, the only case where you would be worried about the number of elements in list is when you delete in a fashion not described by a stack / Q interface ( a non-linear approach). which would be when the elements are ordered (it can be otherwise of course) and order needs to be maintained.
Using a double list is definitely hard on memory requirement as each node needs to maintain an "extra" pointer, unless you are trying to do maintain a store where you refer to past values as well. Dlist will be handy here (Eg: a cli command line interpreter.)
A single list is hard on time as traversals required to reach the previous value of the current node will depend on the number of elements in the list.

Dynamic Data Structure in C for ordering time_t objects?

I need to add an unknown number of times using pthreads to a data structure and order them earliest first, can anybody recommend a good structure (linked list/ array list) for this?
A linked list will be O(n) in finding the place where the new object is to go, but constant in inserting it.
A dynamic array/array list will be O(log(n)) finding the right place but worst case O(n) insertion, since you'll need to move all values past the insertion point one over.
If you don't need random access, or at least not until the end, you could use a heap, O(log(n)) insertion, after you're done you can pull them out in O(log(n)) each, so O(n*log(n)) for all of them.
And it's possible there's a (probably tree-based) structure that can do all of it in O(log(n)) (red-black tree?).
So, in the end it boils down to how, precisely, you want to use it.
Edit: Looked up red-black trees and it looks like they are O(log(n)) search ("amortized O(1)", according to Wikipedia), insertion, and deletion, so that may be what you want.
If you just need to order at the end, use a linked-list to store the pthreads maintaining a count of records added. Then create an array of size count copying the elements to the newly created array and deleting them from the list.
Finally sort the array using qsort.
If you need to maintain an ordered list of pthreads use heap
The former approach would have the following complexity
O(n) for Insert
O(nlog(n)) for Sorting
The Later Approach would have
O(nlog(n)) for Insert and Fetching
You can also see priority queue
Please note if you are open in using STL, you can go for STL priority_queue
In terms of memory the later would consume more memory because you have to store two pointers per node.

C Directed Graph Implementation Choice

Welcome mon amie,
In some homework of mine, I feel the need to use the Graph ADT. However, I'd like to have it, how do I say, generic. That is to say, I want to store in it whatever I fancy.
The issue I'm facing, has to do with complexity. What data structure should I use to represent the set of nodes? I forgot to say that I already decided to use the Adjacency list technic.
Generally, textbooks mention a linked list, but, it is to my understanding that whenever a linked list is useful and we need to perform searches, a tree is better.
But then again, what we need is to associate a node with its list of adjacent nodes, so what about an hash table?
Can you help me decide in which data structure (linked list, tree, hash table) should i store the nodes?
...the Graph ADT. However, I'd like to have it, how do I say, generic. That is to say, I want to store in it whatever I fancy.
That's basically the point of an ADT (Abstract Data Type).
Regarding which data structure to use, you can use either.
For the set of nodes, a Hash table would be a good option (if you have a good C implementation for it). You will have amortized O(1) access to any node.
A LinkedList will take worst case O(n) time to find a node, a Balanced Tree will be O(logn) and won't give you any advantage over the hash table unless for some reason you will sort the set of nodes a LOT (in which case using an inorder traversal of a tree is sorted and in O(n) time)
Regarding adjacency lists for each node, it depends what you want to do with the graph.
If you will implement only, say, DFS and BFS, you need to iterate through all neighbors of a specific node, so LinkedList is the simplest way and it is sufficient.
But, if you need to check if a specific edge exists, it will take you worst case O(n) time because you need to iterate through the whole list (An Adjacency Matrix implementation would make this op O(1))
A LinkedList of adjacent nodes can be sufficient, it depends on what you are going to do.
If you need to know what nodes are adjacent to one another, you could use an adjacency matrix. In other words, for a graph of n nodes, you have an n x n matrix whose entry for (i,j) is 1 if i and j are next to each other in the graph.

Quickly searching a doubly linked list

I currently have a simple database program that reads keys in from a text file and stores them in a doubly linked list (values are read later if they are required). Currently, I do a sequential search on the list, but that is clearly rather slow. I was hoping that there is another way to do. I was reading about binary trees (in particular, red black trees) but I don't know to much about them, and was hoping that I could gleam something from the stackoverflow hivemind :) I suppose my question is, what is the fastest way to do a search in a doubly linked list?
EDIT: Forgot to say that the list is sorted. Don't know if that changes anything. Also, the reason I only read in keys is that the max value length is 1024*32 bytes, which I feel is too large. Note that this is for an assignment, so "typical usage scenarios" don't apply. The professors are likely going to be stress testing the hell out of this thing, and I don't want to be mallocing blocks that big.
There is a thing called a "skip list" that you could use.
It is a set of ordered lists. Each list skips more of the list items. This lets you do a form of binary search. However, maintaining the lists is more difficult.
The fastest way to do a search in an unsorted doubly-linked list is one element at a time.
If you're trying to make search faster, don't use a linked list. Your idea of using a binary tree, for example, will certainly be faster, but as Matthew Flaschen said in comments, it's a completely different implementation from what you're using now.
Given that your doubly-linked list is sorted, and you have a list of items to search for, I suggest looking into the problem of building a self-balancing binary search tree. The tree construction could take some time, but it will be amortized if you have a long list of items to search for.

Resources