C data structure suitable for fast search and simple add/remove - c

As stated in the question header, I need a data structure suitable for fast and efficient searching. The data structure should also be able to add/remove elements to/from anywhere inside the data structure.
Currently I'm using a linked-list. But the problem is that I should walk through the list to find a desired element. General searching algorithms (binary search, jump search and ...) are not directly usable in linked lists, as there is no random access to list elements. Sorting list elements needed in these algorithms is also a problem.
On the other hand, I can't use arrays as it's hard to add/remove an element to/from any desired index.
I've looked for searching algorithms in linked lists and I came to 'skip lists'. Now I'm here to ask if there is a better data structure for my case, or if there is a better search algorithm for linked lists.

I would use AVL binary search tree
For an example of binary search tree you can take a look at https://www.geeksforgeeks.org/avl-tree-set-1-insertion/ and https://www.geeksforgeeks.org/avl-tree-set-2-deletion/
It's well detailed, there is C code and schema.
It's efficient to search in, and It allows you to add and delete values.
It works for both numeric values and some characters implementations (such as dictionnay).

Related

Efficiently Finding if an element exists in a linked List

Presuppose there is a linked list with information. My Program architecture rests upon being able to find out if certain information already exists in an individual node in this linked list.
(Ie, see if the child integer value int example already has a value of 5 in some node.)
After that, each node can be systematically operated on.
I can think of various ways to do this, but I was hoping someone more experienced might show something fairly efficient.
Also, is this good practice, or should there be another, more suitable data structure out there?
Thanks!
If O(N) is not enough, sorted array and binary search, or a BST would give you O(log(N)). Alternatively you can look at hashmap data structure. This structure will give you nearly constant time lookup based on a key, but is more complicated than the other options. Problem is that there is no std library implementation of one.
Otherwise searching each element is the best you can hope to do with a linked list.

File format need for inverted indexing

i have been working on Inverted indexing, which index documents collection, store each term with information and also store its reference in posting file (document id, location, etc.).
Currently i store it in .txt file format which need string matching for each and every query concerning to that .txt file, which take more time and also something more complex.
Now i want to store that information in a file like linked list style data structure. so is this possible for this type of scenario.... (and also i am using PHP language for indexing).
Any help will be appreciated, thanks.
The point of an inverted index is to allow for extremely fast access to the list of occurrences (the postings list) for any given term. If you want to implement it using simple, readily-available data structures, then the best you can probably do is
Use a hash to store the mapping from terms to postings lists
Store each postings list as a continuous block of sorted integers (i.e. something like ArrayList in Java or std::vector in C++). Do not use a linked list because that involves a huge amount of space wasted for pointers
A more proper (and more sophisticated) implementation would take into account:
That postings lists can get very large, so you would have to break it up into multiple chunks, each stored as one continuous block
That postings lists can and should be compressed
Detailed descriptions of these techniques are found in the classical book Managing Gigabytes.

C Database Design, Sortable by Multiple Fields

If memory is not an issue for my particular application (entry, lookup, and sort speed being the priorities), what kind of data structure/concept would be the best option for a multi-field rankings table?
For example, let's say I want to create a Hall of Fame for a game, sortable by top score (independent of username), username (with all scores by the same user placed together before ranking users by their highest scores), or level reached (independent of score or name). In this example, if I order a linked list, vector, or any other sequential data structure by the top score of each player, it makes searching for the other fields -- like level and non-highest scores -- more iterative (i.e. iterate across all looking for the stage, or looking for a specific score-range), unless I conceive some other way to store the information sorted when I enter new data.
The question is whether there is a more efficient (albeit complicated and memory-consumptive) method or database structure in C/C++ that might be primed for this kind of multi-field sort. Linked lists seem fine for simple score rankings, and I could even organize a hashtable by hashing on a single field (player name, or level reached) to sort by a single field, but then the other fields take O(N) to find, worse to sort. With just three fields, I wonder if there is a way (like sets or secondary lists) to prevent iterating in certain pre-desired sorts that we know beforehand.
Do it the same way databases do it: using index structures. You have your main data as a number of records (structs), perhaps ordered according to one of your sorting criteria. Then you have index structures, each one ordered according to one of your other sorting criteria, but these index structures don't contain copies of all the data, just pointers to the main data records. (Think "index" like the index in a book, with page numbers "pointing" into the main data body.)
Using ordered linked list for your index structures will give you a fast and simple way to go through the records in order, but it will be slow if you need to search for a given value, and similarly slow when inserting new data.
Hash tables will have fast search and insertion, but (with normal hash tables) won't help you with ordering at all.
So I suggest some sort of tree structure. Balanced binary trees (look for AVL trees) work well in main memory.
But don't forget the option to use an actual database! Database managers such as MySQL and SQLite can be linked with your program, without a separate server, and let you do all your sorting and indexing very easily, using SQL embedded in your program. It will probably execute a bit slower than if you hand-craft your own main-memory data structures, or if you use main-memory data structures from a library, but it might be easier to code, and you won't need to write separate code to save the data on disk.
So, you already know how to store your data and keep it sorted with respect to a single field. Assuming the values of the fields for a single entry are independent, the only way you'll be able to get what you want is to keep three different lists (using the data structure of your choice), each of which are sorted to a different field. You'll use three times the memory's worth of pointers of a single list.
As for what data structure each of the lists should be, using a binary max heap will be effective. Insertion is lg(N), and displaying individual entries in order is O(1) (so O(N) to see all of them). If in some of these list copies the entries need to be sub-sorted by another field, just consider that in the comparison function call.

C Directed Graph Implementation Choice

Welcome mon amie,
In some homework of mine, I feel the need to use the Graph ADT. However, I'd like to have it, how do I say, generic. That is to say, I want to store in it whatever I fancy.
The issue I'm facing, has to do with complexity. What data structure should I use to represent the set of nodes? I forgot to say that I already decided to use the Adjacency list technic.
Generally, textbooks mention a linked list, but, it is to my understanding that whenever a linked list is useful and we need to perform searches, a tree is better.
But then again, what we need is to associate a node with its list of adjacent nodes, so what about an hash table?
Can you help me decide in which data structure (linked list, tree, hash table) should i store the nodes?
...the Graph ADT. However, I'd like to have it, how do I say, generic. That is to say, I want to store in it whatever I fancy.
That's basically the point of an ADT (Abstract Data Type).
Regarding which data structure to use, you can use either.
For the set of nodes, a Hash table would be a good option (if you have a good C implementation for it). You will have amortized O(1) access to any node.
A LinkedList will take worst case O(n) time to find a node, a Balanced Tree will be O(logn) and won't give you any advantage over the hash table unless for some reason you will sort the set of nodes a LOT (in which case using an inorder traversal of a tree is sorted and in O(n) time)
Regarding adjacency lists for each node, it depends what you want to do with the graph.
If you will implement only, say, DFS and BFS, you need to iterate through all neighbors of a specific node, so LinkedList is the simplest way and it is sufficient.
But, if you need to check if a specific edge exists, it will take you worst case O(n) time because you need to iterate through the whole list (An Adjacency Matrix implementation would make this op O(1))
A LinkedList of adjacent nodes can be sufficient, it depends on what you are going to do.
If you need to know what nodes are adjacent to one another, you could use an adjacency matrix. In other words, for a graph of n nodes, you have an n x n matrix whose entry for (i,j) is 1 if i and j are next to each other in the graph.

Quickly searching a doubly linked list

I currently have a simple database program that reads keys in from a text file and stores them in a doubly linked list (values are read later if they are required). Currently, I do a sequential search on the list, but that is clearly rather slow. I was hoping that there is another way to do. I was reading about binary trees (in particular, red black trees) but I don't know to much about them, and was hoping that I could gleam something from the stackoverflow hivemind :) I suppose my question is, what is the fastest way to do a search in a doubly linked list?
EDIT: Forgot to say that the list is sorted. Don't know if that changes anything. Also, the reason I only read in keys is that the max value length is 1024*32 bytes, which I feel is too large. Note that this is for an assignment, so "typical usage scenarios" don't apply. The professors are likely going to be stress testing the hell out of this thing, and I don't want to be mallocing blocks that big.
There is a thing called a "skip list" that you could use.
It is a set of ordered lists. Each list skips more of the list items. This lets you do a form of binary search. However, maintaining the lists is more difficult.
The fastest way to do a search in an unsorted doubly-linked list is one element at a time.
If you're trying to make search faster, don't use a linked list. Your idea of using a binary tree, for example, will certainly be faster, but as Matthew Flaschen said in comments, it's a completely different implementation from what you're using now.
Given that your doubly-linked list is sorted, and you have a list of items to search for, I suggest looking into the problem of building a self-balancing binary search tree. The tree construction could take some time, but it will be amortized if you have a long list of items to search for.

Resources