Efficiently Finding if an element exists in a linked List

Efficiently Finding if an element exists in a linked List - c

Presuppose there is a linked list with information. My Program architecture rests upon being able to find out if certain information already exists in an individual node in this linked list.
(Ie, see if the child integer value int example already has a value of 5 in some node.)
After that, each node can be systematically operated on.
I can think of various ways to do this, but I was hoping someone more experienced might show something fairly efficient.
Also, is this good practice, or should there be another, more suitable data structure out there?
Thanks!

If O(N) is not enough, sorted array and binary search, or a BST would give you O(log(N)). Alternatively you can look at hashmap data structure. This structure will give you nearly constant time lookup based on a key, but is more complicated than the other options. Problem is that there is no std library implementation of one.
Otherwise searching each element is the best you can hope to do with a linked list.

Related

C data structure suitable for fast search and simple add/remove

As stated in the question header, I need a data structure suitable for fast and efficient searching. The data structure should also be able to add/remove elements to/from anywhere inside the data structure.
Currently I'm using a linked-list. But the problem is that I should walk through the list to find a desired element. General searching algorithms (binary search, jump search and ...) are not directly usable in linked lists, as there is no random access to list elements. Sorting list elements needed in these algorithms is also a problem.
On the other hand, I can't use arrays as it's hard to add/remove an element to/from any desired index.
I've looked for searching algorithms in linked lists and I came to 'skip lists'. Now I'm here to ask if there is a better data structure for my case, or if there is a better search algorithm for linked lists.

I would use AVL binary search tree
For an example of binary search tree you can take a look at https://www.geeksforgeeks.org/avl-tree-set-1-insertion/ and https://www.geeksforgeeks.org/avl-tree-set-2-deletion/
It's well detailed, there is C code and schema.
It's efficient to search in, and It allows you to add and delete values.
It works for both numeric values and some characters implementations (such as dictionnay).

How to determine the point at which a doubly linked list becomes a better solution than a simple linked list?

It's my understanding that doubly linked lists use more memory but less CPU and usually provide better algorithm complexity compared to simple linked lists.
What I would like to know is when a simple linked list offers better overall results compared to a doubly linked list. Is there a clear point/situation after which using one instead of the other is unarguably the best solution? (Like after x elements on a regular PC)
My specific problem:
I'm implementing a linked list structure for general use and am thinking whether or not a link back should be included as it would greatly decrease complexity for element removal.
Thanks.
UPDATE:
At what size does element removal become too expensive on a simple linked list?

Choosing a data structure is about weighing costs vs benefits, usually work vs upkeep.
A singly linked list offers easy traversal and easy front insertion. Easy tail insertion can be had at the cost of tracking the last node. That cost means whenever you add/remove a node you have to do an additional check (is this the tail?) and a possible update of the list structure.
A doubly linked list adds a lot more maintenance overhead. Every node now has to store two pointers and those have to be managed and maintained.
If you never need to walk backwards in your list, then a singly linked list is ideal, but not being able to walk backwards means that removal is more expensive.
So you need to start by determining which usage pattern you're going to have. If you are building a one-use list, then a singly linked list may be ideal. If you are building a dynamic list with a high rate of removals, then a doubly linked list will be better suited.
Determining specific costs of operations your data structure offers is the topic of things like 'Big O Notation'.

What I would like to know is when a simple linked list offers better
overall results compared to a doubly linked list.
When you don't have to go backwards.
If you're doing a one way linear search, there's no incentive to traverse the list the other direction, and you wouldn't need those pointers.
UPDATE:
At what size does element removal become too expensive on a simple
linked list?
This problem doesn't have anything to with whether a list is singly or doubly linked. if you have to delete something in a list, you need to look for it, which is time complexity O(n). Having an extra pointer to the previous node doesn't help here.

Linked List is a data structure which can be used to implement a stack or a queue, most of the times. Generally speaking if you are implementing a stack insertion and deletion happen at a single end. if a Q is used usually we delete at one end and add at other end. As you can see both these abstract ds don't need double list and operations add and delete don't depend on number of items.
As mentioned above by Kepani, the only case where you would be worried about the number of elements in list is when you delete in a fashion not described by a stack / Q interface ( a non-linear approach). which would be when the elements are ordered (it can be otherwise of course) and order needs to be maintained.
Using a double list is definitely hard on memory requirement as each node needs to maintain an "extra" pointer, unless you are trying to do maintain a store where you refer to past values as well. Dlist will be handy here (Eg: a cli command line interpreter.)
A single list is hard on time as traversals required to reach the previous value of the current node will depend on the number of elements in the list.

C Directed Graph Implementation Choice

Welcome mon amie,
In some homework of mine, I feel the need to use the Graph ADT. However, I'd like to have it, how do I say, generic. That is to say, I want to store in it whatever I fancy.
The issue I'm facing, has to do with complexity. What data structure should I use to represent the set of nodes? I forgot to say that I already decided to use the Adjacency list technic.
Generally, textbooks mention a linked list, but, it is to my understanding that whenever a linked list is useful and we need to perform searches, a tree is better.
But then again, what we need is to associate a node with its list of adjacent nodes, so what about an hash table?
Can you help me decide in which data structure (linked list, tree, hash table) should i store the nodes?

...the Graph ADT. However, I'd like to have it, how do I say, generic. That is to say, I want to store in it whatever I fancy.
That's basically the point of an ADT (Abstract Data Type).
Regarding which data structure to use, you can use either.
For the set of nodes, a Hash table would be a good option (if you have a good C implementation for it). You will have amortized O(1) access to any node.
A LinkedList will take worst case O(n) time to find a node, a Balanced Tree will be O(logn) and won't give you any advantage over the hash table unless for some reason you will sort the set of nodes a LOT (in which case using an inorder traversal of a tree is sorted and in O(n) time)
Regarding adjacency lists for each node, it depends what you want to do with the graph.
If you will implement only, say, DFS and BFS, you need to iterate through all neighbors of a specific node, so LinkedList is the simplest way and it is sufficient.
But, if you need to check if a specific edge exists, it will take you worst case O(n) time because you need to iterate through the whole list (An Adjacency Matrix implementation would make this op O(1))
A LinkedList of adjacent nodes can be sufficient, it depends on what you are going to do.

If you need to know what nodes are adjacent to one another, you could use an adjacency matrix. In other words, for a graph of n nodes, you have an n x n matrix whose entry for (i,j) is 1 if i and j are next to each other in the graph.

Quickly searching a doubly linked list

I currently have a simple database program that reads keys in from a text file and stores them in a doubly linked list (values are read later if they are required). Currently, I do a sequential search on the list, but that is clearly rather slow. I was hoping that there is another way to do. I was reading about binary trees (in particular, red black trees) but I don't know to much about them, and was hoping that I could gleam something from the stackoverflow hivemind :) I suppose my question is, what is the fastest way to do a search in a doubly linked list?
EDIT: Forgot to say that the list is sorted. Don't know if that changes anything. Also, the reason I only read in keys is that the max value length is 1024*32 bytes, which I feel is too large. Note that this is for an assignment, so "typical usage scenarios" don't apply. The professors are likely going to be stress testing the hell out of this thing, and I don't want to be mallocing blocks that big.

There is a thing called a "skip list" that you could use.
It is a set of ordered lists. Each list skips more of the list items. This lets you do a form of binary search. However, maintaining the lists is more difficult.

The fastest way to do a search in an unsorted doubly-linked list is one element at a time.
If you're trying to make search faster, don't use a linked list. Your idea of using a binary tree, for example, will certainly be faster, but as Matthew Flaschen said in comments, it's a completely different implementation from what you're using now.

Given that your doubly-linked list is sorted, and you have a list of items to search for, I suggest looking into the problem of building a self-balancing binary search tree. The tree construction could take some time, but it will be amortized if you have a long list of items to search for.

How should I change my Graph structure (very slow insertion)?

This program I'm doing is about a social network, which means there are users and their profiles. The profiles structure is UserProfile.
Now, there are various possible Graph implementations and I don't think I'm using the best one. I have a Graph structure and inside, there's a pointer to a linked list of type Vertex. Each Vertex element has a value, a pointer to the next Vertex and a pointer to a linked list of type Edge. Each Edge element has a value (so I can define weights and whatever it's needed), a pointer to the next Edge and a pointer to the Vertex owner.
I have a 2 sample files with data to process (in CSV style) and insert into the Graph. The first one is the user data (one user per line); the second one is the user relations (for the graph). The first file is quickly inserted into the graph cause I always insert at the head and there's like ~18000 users. The second file takes ages but I still insert the edges at the head. The file has about ~520000 lines of user relations and takes between 13-15mins to insert into the Graph. I made a quick test and reading the data is pretty quickly, instantaneously really. The problem is in the insertion.
This problem exists because I have a Graph implemented with linked lists for the vertices. Every time I need to insert a relation, I need to lookup for 2 vertices, so I can link them together. This is the problem... Doing this for ~520000 relations, takes a while.
How should I solve this?
Solution 1) Some people recommended me to implement the Graph (the vertices part) as an array instead of a linked list. This way I have direct access to every vertex and the insertion is probably going to drop considerably. But, I don't like the idea of allocating an array with [18000] elements. How practically is this? My sample data has ~18000, but what if I need much less or much more? The linked list approach has that flexibility, I can have whatever size I want as long as there's memory for it. But the array doesn't, how am I going to handle such situation? What are your suggestions?
Using linked lists is good for space complexity but bad for time complexity. And using an array is good for time complexity but bad for space complexity.
Any thoughts about this solution?
Solution 2) This project also demands that I have some sort of data structures that allows quick lookup based on a name index and an ID index. For this I decided to use Hash Tables. My tables are implemented with separate chaining as collision resolution and when a load factor of 0.70 is reach, I normally recreate the table. I base the next table size on this Link.
Currently, both Hash Tables hold a pointer to the UserProfile instead of duplication the user profile itself. That would be stupid, changing data would require 3 changes and it's really dumb to do it that way. So I just save the pointer to the UserProfile. The same user profile pointer is also saved as value in each Graph Vertex.
So, I have 3 data structures, one Graph and two Hash Tables and every single one of them point to the same exact UserProfile. The Graph structure will serve the purpose of finding the shortest path and stuff like that while the Hash Tables serve as quick index by name and ID.
What I'm thinking to solve my Graph problem is to, instead of having the Hash Tables value point to the UserProfile, I point it to the corresponding Vertex. It's still a pointer, no more and no less space is used, I just change what I point to.
Like this, I can easily and quickly lookup for each Vertex I need and link them together. This will insert the ~520000 relations pretty quickly.
I thought of this solution because I already have the Hash Tables and I need to have them, then, why not take advantage of them for indexing the Graph vertices instead of the user profile? It's basically the same thing, I can still access the UserProfile pretty quickly, just go to the Vertex and then to the UserProfile.
But, do you see any cons on this second solution against the first one? Or only pros that overpower the pros and cons on the first solution?
Other Solution) If you have any other solution, I'm all ears. But please explain the pros and cons of that solution over the previous 2. I really don't have much time to be wasting with this right now, I need to move on with this project, so, if I'm doing to do such a change, I need to understand exactly what to change and if that's really the way to go.
Hopefully no one fell asleep reading this and closed the browser, sorry for the big testament. But I really need to decide what to do about this and I really need to make a change.
P.S: When answering my proposed solutions, please enumerate them as I did so I know exactly what are you talking about and don't confuse my self more than I already am.

The first approach is the Since the main issue here is speed, I would prefer the array approach.
You should, of course, maintain the hash table for the name-index lookup.
If I understood correctly, you only process the data one time. So there is no dynamic data insertion.
To deal with the space allocation problem, I would recommend:
1 - Read once the file, to get the number of vertex.
2 - allocate that space
If you data is dynamic, you could implement some simple method to increment the array size in steps of 50%.
3 - In the Edges, substitute you linked list for an array. This array should be dynamically incremented with steps of 50%.
Even with the "extra" space allocated, when you increment the size with steps of 50%, the total size used by the array should only be marginally larger than with the size of the linked list.
I hope I could help.