Deduplication optimization - c

The problem is as follows. I want a function that, given a list and a max number of occurrences "x", deletes all elements of the list that appear more than x times or x times.
I found a pretty straightforward solution, which is to check for each of the elements. This said, to repeat the find and delete functions many times seems computationally-wise not optimal to me.
I was wondering whether you could provide a better algorithm (i excluded allocating memory for a matrix from the min to the max... just too much for the task... say you have few very big numbers and your memory won't do it.)
My code follows.
typedef struct n_s
{
int val;
struct n_s *next;
}
n_t;
// deletes all elements equal to del in list with head h
n_t * delete(n_t *h, int del);
// returns the first occurrence of find in list with head h, otherwise gives NULL
n_t * find(n_t *h, int find);
n_t *
delFromList(n_t *h, int x)
{
int val;
n_t *el, *posInter;
// empty list case
if (h == NULL)
return NULL;
// first element
val=h->val;
if ( (posInter = find(h -> next,val))
&& (find(posInter -> next, val)))
h = delete(h, val);
// loop from second element
el = h;
while (el -> next)
{
val = el -> next -> val;
// check whether you want to delete the next one,
// and then if you do so, check again on the "new" next one
if ((posInter = find(el -> next -> next, val))
&& (find(posInter -> next, val)))
el -> next = delete(el -> next, val);
// in case you did not delete the nexy node, you can move on
else
el = el -> next;
}
return h;
}
I know that the el->next->next may look confusing, but I find it less intuitive to use variables such as "next", "past"... so, sorry for your headache.

One option for an algorithm with improved performance is:
Define a data structure D with two members, one for the value of a list element and one to count the number of times it appears.
Initialize an empty balanced tree ordered by value.
Iterate through the list. For each item in the list, look it up in the tree. If it is not present, insert a D structure into that tree with its value member copied from the list element and its count set to one. If it is present in the tree, increments its count. If its count equals or exceeds the threshold, remove it from the list.
Lookups and insertions in a balanced tree are O(log n). A linked list of n items uses n of them, and deletions from a linked list are O(1). So the total time is O(n log n).

Use a counting map to count the number of times each element appears. The keys are the elements, and the values are the counts.
Then, go through your array a second time, deleting anything which meets your threshold.
O(n) time, O(n) extra space.

Related

How to sort a linked list using an efficient algorithm, with only the list sent as parameters? [duplicate]

I'm curious if O(n log n) is the best a linked list can do.
It is reasonable to expect that you cannot do any better than O(N log N) in running time.
However, the interesting part is to investigate whether you can sort it in-place, stably, its worst-case behavior and so on.
Simon Tatham, of Putty fame, explains how to sort a linked list with merge sort. He concludes with the following comments:
Like any self-respecting sort algorithm, this has running time O(N log N). Because this is Mergesort, the worst-case running time is still O(N log N); there are no pathological cases.
Auxiliary storage requirement is small and constant (i.e. a few variables within the sorting routine). Thanks to the inherently different behaviour of linked lists from arrays, this Mergesort implementation avoids the O(N) auxiliary storage cost normally associated with the algorithm.
There is also an example implementation in C that work for both singly and doubly linked lists.
As #Jørgen Fogh mentions below, big-O notation may hide some constant factors that can cause one algorithm to perform better because of memory locality, because of a low number of items, etc.
Depending on a number of factors, it may actually be faster to copy the list to an array and then use a Quicksort.
The reason this might be faster is that an array has much better
cache performance than a linked list. If the nodes in the list are dispersed in memory, you
may be generating cache misses all over the place. Then again, if the array is large you will get cache misses anyway.
Mergesort parallelises better, so it may be a better choice if that is what you want. It is also much faster if you perform it directly on the linked list.
Since both algorithms run in O(n * log n), making an informed decision would involve profiling them both on the machine you would like to run them on.
Update
I decided to test my hypothesis and wrote a C-program which measured the time (using clock()) taken to sort a linked list of ints. I tried with a linked list where each node was allocated with malloc() and a linked list where the nodes were laid out linearly in an array, so the cache performance would be better. I compared these with the built-in qsort, which included copying everything from a fragmented list to an array and copying the result back again. Each algorithm was run on the same 10 data sets and the results were averaged.
These are the results:
N
merge sort (fragmented)
Array w/qsort
merge sort (packed)
1,000
<1 ms
<1 ms
<1 ms
100,000
39 ms
25 ms
9 ms
1,000,000
1,162 ms
420 ms
112 ms
100,000,000
364,797 ms
61,166 ms
16,525 ms
Conclusion
At least on my machine, copying into an array is well worth it to improve the cache performance, since you rarely have a completely packed linked list in real life. It should be noted that my machine has a 2.8GHz Phenom II, but only 0.6GHz RAM, so the cache is very important.
This is a nice little paper on this topic. His empirical conclusion is that Treesort is best, followed by Quicksort and Mergesort. Sediment sort, bubble sort, selection sort perform very badly.
A COMPARATIVE STUDY OF LINKED LIST SORTING ALGORITHMS
by Ching-Kuang Shene
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.9981
Comparison sorts (i.e. ones based on comparing elements) cannot possibly be faster than n log n. It doesn't matter what the underlying data structure is. See Wikipedia.
Other kinds of sort that take advantage of there being lots of identical elements in the list (such as the counting sort), or some expected distribution of elements in the list, are faster, though I can't think of any that work particularly well on a linked list.
As stated many times, the lower bound on comparison based sorting for general data is going to be O(n log n). To briefly resummarize these arguments, there are n! different ways a list can be sorted. Any sort of comparison tree that has n! (which is in O(n^n)) possible final sorts is going to need at least log(n!) as its height: this gives you a O(log(n^n)) lower bound, which is O(n log n).
So, for general data on a linked list, the best possible sort that will work on any data that can compare two objects is going to be O(n log n). However, if you have a more limited domain of things to work in, you can improve the time it takes (at least proportional to n). For instance, if you are working with integers no larger than some value, you could use Counting Sort or Radix Sort, as these use the specific objects you're sorting to reduce the complexity with proportion to n. Be careful, though, these add some other things to the complexity that you may not consider (for instance, Counting Sort and Radix sort both add in factors that are based on the size of the numbers you're sorting, O(n+k) where k is the size of largest number for Counting Sort, for instance).
Also, if you happen to have objects that have a perfect hash (or at least a hash that maps all values differently), you could try using a counting or radix sort on their hash functions.
Not a direct answer to your question, but if you use a Skip List, it is already sorted and has O(log N) search time.
A Radix sort is particularly suited to a linked list, since it's easy to make a table of head pointers corresponding to each possible value of a digit.
Merge sort doesn't require O(1) access and is O ( n ln n ). No known algorithms for sorting general data are better than O ( n ln n ).
The special data algorithms such as radix sort ( limits size of data ) or histogram sort ( counts discrete data ) could sort a linked list with a lower growth function, as long as you use a different structure with O(1) access as temporary storage.
Another class of special data is a comparison sort of an almost sorted list with k elements out of order. This can be sorted in O ( kn ) operations.
Copying the list to an array and back would be O(N), so any sorting algorithm can be used if space is not an issue.
For example, given a linked list containing uint_8, this code will sort it in O(N) time using a histogram sort:
#include <stdio.h>
#include <stdint.h>
#include <malloc.h>
typedef struct _list list_t;
struct _list {
uint8_t value;
list_t *next;
};
list_t* sort_list ( list_t* list )
{
list_t* heads[257] = {0};
list_t* tails[257] = {0};
// O(N) loop
for ( list_t* it = list; it != 0; it = it -> next ) {
list_t* next = it -> next;
if ( heads[ it -> value ] == 0 ) {
heads[ it -> value ] = it;
} else {
tails[ it -> value ] -> next = it;
}
tails[ it -> value ] = it;
}
list_t* result = 0;
// constant time loop
for ( size_t i = 255; i-- > 0; ) {
if ( tails[i] ) {
tails[i] -> next = result;
result = heads[i];
}
}
return result;
}
list_t* make_list ( char* string )
{
list_t head;
for ( list_t* it = &head; *string; it = it -> next, ++string ) {
it -> next = malloc ( sizeof ( list_t ) );
it -> next -> value = ( uint8_t ) * string;
it -> next -> next = 0;
}
return head.next;
}
void free_list ( list_t* list )
{
for ( list_t* it = list; it != 0; ) {
list_t* next = it -> next;
free ( it );
it = next;
}
}
void print_list ( list_t* list )
{
printf ( "[ " );
if ( list ) {
printf ( "%c", list -> value );
for ( list_t* it = list -> next; it != 0; it = it -> next )
printf ( ", %c", it -> value );
}
printf ( " ]\n" );
}
int main ( int nargs, char** args )
{
list_t* list = make_list ( nargs > 1 ? args[1] : "wibble" );
print_list ( list );
list_t* sorted = sort_list ( list );
print_list ( sorted );
free_list ( list );
}
As I know, the best sorting algorithm is O(n*log n), whatever the container - it's been proved that sorting in the broad sense of the word (mergesort/quicksort etc style) can't go lower. Using a linked list will not give you a better run time.
The only one algorithm which runs in O(n) is a "hack" algorithm which relies on counting values rather than actually sorting.
Here's an implementation that traverses the list just once, collecting runs, then schedules the merges in the same way that mergesort does.
Complexity is O(n log m) where n is the number of items and m is the number of runs. Best case is O(n) (if the data is already sorted) and worst case is O(n log n) as expected.
It requires O(log m) temporary memory; the sort is done in-place on the lists.
(updated below. commenter one makes a good point that I should describe it here)
The gist of the algorithm is:
while list not empty
accumulate a run from the start of the list
merge the run with a stack of merges that simulate mergesort's recursion
merge all remaining items on the stack
Accumulating runs doesn't require much explanation, but it's good to take the opportunity to accumulate both ascending runs and descending runs (reversed). Here it prepends items smaller than the head of the run and appends items greater than or equal to the end of the run. (Note that prepending should use strict less-than to preserve sort stability.)
It's easiest to just paste the merging code here:
int i = 0;
for ( ; i < stack.size(); ++i) {
if (!stack[i])
break;
run = merge(run, stack[i], comp);
stack[i] = nullptr;
}
if (i < stack.size()) {
stack[i] = run;
} else {
stack.push_back(run);
}
Consider sorting the list (d a g i b e c f j h) (ignoring runs). The stack states proceed as follows:
[ ]
[ (d) ]
[ () (a d) ]
[ (g), (a d) ]
[ () () (a d g i) ]
[ (b) () (a d g i) ]
[ () (b e) (a d g i) ]
[ (c) (b e) (a d g i ) ]
[ () () () (a b c d e f g i) ]
[ (j) () () (a b c d e f g i) ]
[ () (h j) () (a b c d e f g i) ]
Then, finally, merge all these lists.
Note that the number of items (runs) at stack[i] is either zero or 2^i and the stack size is bounded by 1+log2(nruns). Each element is merged once per stack level, hence O(n log m) comparisons. There's a passing similarity to Timsort here, though Timsort maintains its stack using something like a Fibonacci sequence where this uses powers of two.
Accumulating runs takes advantage of any already sorted data so that best case complexity is O(n) for an already sorted list (one run). Since we're accumulating both ascending and descending runs, runs will always be at least length 2. (This reduces the maximum stack depth by at least one, paying for the cost of finding the runs in the first place.) Worst case complexity is O(n log n), as expected, for data that is highly randomized.
(Um... Second update.)
Or just see wikipedia on bottom-up mergesort.
You can copy it into an array and then sort it.
Copying into array O(n),
sorting O(nlgn) (if you use a fast algorithm like merge sort ),
copying back to linked list O(n) if necessary,
so it is gonna be O(nlgn).
note that if you do not know the number of elements in the linked list you won't know the size of array. If you are coding in java you can use an Arraylist for example.
The question is LeetCode #148, and there are plenty of solutions offered in all major languages. Mine is as follows, but I'm wondering about the time complexity. In order to find the middle element, we traverse the complete list each time. First time n elements are iterated over, second time 2 * n/2 elements are iterated over, so on and so forth. It seems to be O(n^2) time.
def sort(linked_list: LinkedList[int]) -> LinkedList[int]:
# Return n // 2 element
def middle(head: LinkedList[int]) -> LinkedList[int]:
if not head or not head.next:
return head
slow = head
fast = head.next
while fast and fast.next:
slow = slow.next
fast = fast.next.next
return slow
def merge(head1: LinkedList[int], head2: LinkedList[int]) -> LinkedList[int]:
p1 = head1
p2 = head2
prev = head = None
while p1 and p2:
smaller = p1 if p1.val < p2.val else p2
if not head:
head = smaller
if prev:
prev.next = smaller
prev = smaller
if smaller == p1:
p1 = p1.next
else:
p2 = p2.next
if prev:
prev.next = p1 or p2
else:
head = p1 or p2
return head
def merge_sort(head: LinkedList[int]) -> LinkedList[int]:
if head and head.next:
mid = middle(head)
mid_next = mid.next
# Makes it easier to stop
mid.next = None
return merge(merge_sort(head), merge_sort(mid_next))
else:
return head
return merge_sort(linked_list)
Mergesort is the best you can do here.

Find the k-th smallest element recursively in a B-tree with more than one elements in a node

Suppose we have the following b-tree
I would like to create an algorithm in order to find the k-th smallest element. I tried to implement what was written in this link but I found out that none of the solutions seem to work for this kind of tree.
So far I have done this, which runs fine for the elements of the last branch
i <-0
function kthSmallestElement(Node node, int k)
if(branch[i] != NULL) then
size<-branch[i].size();
if(k < size) then
i++;
call the function recursively for new branch[i], k
else if(k > size) then
k-=size
i++;
call the function recursively for new branch[i], k
else if (k==size) then
print branch[i]->entry[k-1]
else
print brach[i-1]->entry[k-1]
end function
I have implemented the algorithm using C
#define MAX 4 /* maximum number of keys in node. */
#define MIN 2 /* minimum number of keys in node */
typedef int Key;
typedef struct {
Key key;
int value; /* values can be arbitrary */
} Treeentry;
typedef enum {FALSE, TRUE} Boolean;
typedef struct treenode Treenode;
struct treenode {
int count; /* denotes how many keys there are in the node */
/*
The entries at each node are kept in an array entry
and the pointers in an array branch
*/
Treeentry entry[MAX+1];
Treenode *branch[MAX+1];
};
int i = 0;
int size = 0;
void FindKthSmallestElement(Treenode *rootNode, int k){
if(branch[i] != NULL) //since the node has a child
size = branch[i] ->count;
if(k < size){
i++;
FindKthSmallestElement(branch[i], k);
}else if(k > size){
k-=size;
i++;
FindKthSmallestElement(branch[i], k);
}else if (k==size)
printf ("%d", branch[i]->entry[k-1].key);
else
printf ("%d", brach[i-1]->entry[k-1].key);
}
Could you please suggest what should I fix in this in order to have a valid output for every kth smallest element? I tend to believe that this problem cannot be solved recursively, since we have more than one entries in each node. Would be wise to make it a heap tree like in this link?
This problem can be solve recursively. All you need is for the function to return 2 things:
The k-th smallest key (or a pointer to it) if it has k or more keys.
The size of the tree if it has less than k keys.
The recursion occurs by calling the function on every subtree of the (root) node, consecutively, from the left-most to the right-most, and with different (decreasing) parameter k:
Let the original/current tree be R, starts recursion by calling the function on R's left-most subtree with the same k as R receives.
If calling the function on a subtree of R successfully returns the k-th smallest key, then that's the answer and return it.
If calling the function on some subtree T of R couldn't find the k-th smallest key, but instead returns a the size of T, say n (< k), then:
If T is the right-most subtree, then R has fewer than k items, returns the size of R (which would have been found by summing the sizes of all its subtrees and the number of keys in R's root).
If n == k-1, then the k-th smallest key is the key immediately to the right of T
If n < k-1, then recurse on the subtree S immediately to the right of T with argument k-n-1 (i.e., to find the (k-n-1)-th smallest key in S)
Obviously you'd have to take care of the terminal condition where a tree's root has no more subtree. Conceptually it may be more easily handled by allowing a NULL subtree, which contains 0 key.
Recursively visit every node and add to a list the k smallest elements of the current node. In the end sort it and get the k-th number.
You could also try comparing the 2 list and keeping the k smallest ones each time but i think it's gonna make the code look more complicated and will end up with roughly the same or worse speed but for sure less memory occupied.

adding two linked lists efficiently in C

I have two linked lists representing the digits of decimal numbers in order from most- to least-significant. for eg 4->7->9->6 and 5->7
The answer should be 4->8->5->3 without reversing the lists because reversing the lists would result in decrease of efficiency.
I am thinking of solving the problem using stack.I will traverse both the lists and push the data elements into two separate stacks.One for each linked list.Then I pop both the stacks together and add both the elements and if the result is a two digit no I 10 modulo it and store the carry in a temp variable.The remainder is stored in the node and the carry is added to the next sum and so on.
if the two stacks are s1 and s2 and the result linked list is res.
temp = 0;
res = (node*)(malloc(sizeof(node*));
while(s1->top!=-1 || s2->top!=-1)
{
temp = 0;
sum = pop(s1) + pop(s2);
n1 = (node*)(malloc(sizeof(node*));
temp = sum/10;
sum = sum%10;
sum = sum+temp;
n1->data = sum;
n1->next = res;
res = n1;
free n1;
//temp=0;
}
if((s1->top==-1)&&(s2->top==-1))
{
return res;
}
else if(s1->top==-1)
{
while(s2->top!=-1)
{
temp = 0;
sum = pop(s2);
sum = sum + temp;
temp = sum/10;
sum = sum%10;
n1 = (node*)(malloc(sizeof(node*));
n1->data = sum;
n1->next = res;
res = n1;
free n1;
}
}
else
{
while(s2->top!=-1)
{
temp = 0;
sum = pop(s2);
sum = sum+temp;
temp = sum/10;
sum = sum%10;
n1=(node*)(malloc(sizeof(node*));
n1->data = sum;
n1->next = res;
res = n1;
free n1;
}
}
return res;
I have come across this problem many times in interview questions but this is the best solution that I could think of.
If anyone can come with something more efficient in c i will be very glad.
Two passes, no stack:
Get the length of the two lists.
Create a solution list with one node. Initialize the value of this node to zero. This will hold the carry digit. Set a list pointer (call it the carry pointer) to the location of this node. Set a list pointer (call it the end pointer) to the location of this node.
Starting with the longer list, for each excess node, link a new node to the end pointer and assign it the value of the excess node. Set the end pointer to this new node. If the
value is less than 9, set the carry pointer to the new node.
Now we're left with both list pointers having the same number of nodes in each.
While the lists are not empty...
Link a new node to the end pointer and advance the end pointer to this node.
Get the values from each list and advance each list pointer to the next node.
Add the two values together.
If value is greater than nine, set the value to value mod 10, increment the value held in the carry pointer's node, move the carry pointer to the next node. If carry pointer's value is nine, set to zero and go to next node.
If value is nine. Set it. Do nothing else.
If value is less than nine. Set it. Set carry pointer to current node.
When you're done with both lists, check if the solution pointer's node value is zero. If it is, set the solution pointer to the next node, deleting the unneeded extra digit.
This is how I would go about solving this:
Step 1: Make a pass on both linked lists, find lengths
say len(L1) = m and len(L2) = n
Step 2: Find difference of lengths
if ( m > n )
d = m - n
else if ( n > m )
d = n - m
else
d = 0
Step 3: Move a temporary pointer d ahead of the larger list
Step 4: Now we have two linked lists to add whose lengths are same, so add them recursively, maintaining a carry.
Step 5:
( Note: if ( d == 0 ) don't perform this step )
After step 4, we've got partial output list, and now we have to put remaining of the larger list at the beginning of output list.
if ( d > 0 )
-Travel larger list till d positions recursively
-Append sum = value_at_end + carry (update carry if sum >= 10) to output list at beginning
-Repeat until difference is consumed
Note: I'm solving the problem as its put before me, not by suggesting the change in underlying data structure.
Time complexity:
Making single passes on both the lists to find their lengths: O(m+n)
Summing two linked lists of equal size (m - d and n) recursively: O(n), assuming m > n
Appending remaining of larger list to output list: O(d)
Total: O( (m+n) + (n) + (d) ) OR O(m+n)
Space complexity:
step 2 of time complexity: O(n), run time stack space
step 3 of time complexity: O(d), run time stack space
Total: O(n + d) OR O(n)
I'd just find the total value of each linked list separately, add them together, then transform that number into a new linked list. So convert 4->7->9->6 and 5->7 to integers with the values 4796 and 57, respectively. Add those together to get 4853, then transform that into a linked list containing 4->8->5->3. You can do the transformations with simple math.
Doing it your way would be a lot easier if you changed the way that the numbers are represented in the first place. Make it so the ones digit is always first, followed by the tens digit, followed by hundreds, etc.
EDIT: Since you're apparently using enormous numbers: have you considered making them doubly-linked lists? Then you wouldn't need to reverse it, per se. Just traverse it backwards.
Using a stack is no more efficient than reversing the lists (actually it is reversing the lists). If your stack object is dynamically allocated this is no big deal, but if you create it with call recursion, you'll easily get Stack Overflow of the bad sort. :-)
If you doubly link the lists, you can add the digits and use the backwards links to find out where to put your carried value. http://en.wikipedia.org/wiki/Doubly_linked_list

C get mode from list of integers

I need to write a program to find the mode. Or the most occurrence of an integer or integers.
So,
1,2,3,4,1,10,4,23,12,4,1 would have mode of 1 and 4.
I'm not really sure what kind of algorithm i should use. I'm having a hard time trying to think of something that would work.
I was thinking of a frequency table of some sort maybe where i could go through array and then go through and create a linked list maybe. If the linked doesn't contain that value add it to the linked, if it does then add 1 to the value.
So if i had the same thing from above. loop through
1,2,3,4,1,10,4,23,12,4,1
Then list is empty so add node with number = 1 and value = 1.
2 doesnt exist so add node with number = 2 and value = 1 and so on.
Get to the 1 and 1 already exists so value = 2 now.
I would have to loop through the array and then loop through linked list everytime to find that value.
Once i am done then go through the linked list and create a new linked list that will hold the modes. So i set the head to the first element which is 1. Then i go through the linked list that contains the occurences and compare the values. If the occurences of the current node is > the current highest then i set the head to this node. If its = to the highest then i add the node to the mode linked list.
Once i am done i loop through the mode list and print the values.
Not sure if this would work. Does anyone see anything wrong with this? Is there an easier way to do this? I was thinking a hash table too, but not really sure how to do that in C.
Thanks.
If you can keep the entire list of integers in memory, you could sort the list first, which will make repeated values adjacent to each other. Then you can do a single pass over the sorted list to look for the mode. That way, you only need to keep track of the best candidate(s) for the mode seen up until now, along with how many times the current value has been seen so far.
The algorithm you have is fine for a homework assignment. There are all sorts of things you could do to optimise the code, such as:
use a binary tree for efficiency,
use an array of counts where the index is the number (assuming the number range is limited).
But I think you'll find they're not necessary in this case. For homework, the intent is just to show that you understand how to program, not that you know all sorts of tricks for wringing out the last ounce of performance. Your educator will be looking far more for readable, structured, code than tricky optimisations.
I'll describe below what I'd do. You're obviously free to use my advice as much or as little as you wish, depending on how much satisfaction you want to gain at doing it yourself. I'll provide pseudo-code only, which is my standard practice for homework questions.
I would start with a structure holding a number, a count and next pointer (for your linked list) and the global pointer to the first one:
typedef struct sElement {
int number;
int count;
struct sElement *next;
} tElement;
tElement first = NULL;
Then create some functions for creating and using the list:
tElement *incrementElement (int number);
tElement *getMaxCountElement (void);
tElement *getNextMatching (tElement *ptr, int count);
Those functions will, respectively:
Increment the count for an element (or create it and set count to 1).
Scan all the elements returning the maximum count.
Get the next element pointer matching the count, starting at a given point, or NULL if no more.
The pseudo-code for each:
def incrementElement (number):
# Find matching number in list or NULL.
set ptr to first
while ptr is not NULL:
if ptr->number is equal to number:
return ptr
set ptr to ptr->next
# If not found, add one at start with zero count.
if ptr is NULL:
set ptr to newly allocated element
set ptr->number to number
set ptr->count to 0
set ptr->next to first
set first to ptr
# Increment count.
set ptr->count to ptr->count + 1
def getMaxCountElement (number):
# List empty, no mode.
if first is NULL:
return NULL
# Assume first element is mode to start with.
set retptr to first
# Process all other elements.
set ptr to first->next
while ptr is not NULL:
# Save new mode if you find one.
if ptr->count is greater than retptr->count:
set retptr to ptr
set ptr to ptr->next
# Return actual mode element pointer.
return retptr
def getNextMatching (ptr, number):
# Process all elements.
while ptr is not NULL:
# If match on count, return it.
if ptr->number is equal to number:
return ptr
set ptr to ptr->next
# Went through whole list with no match, return NULL.
return NULL
Then your main program becomes:
# Process all the numbers, adding to (or incrementing in) list .
for each n in numbers to process:
incrementElement (n)
# Get the mode quantity, only look for modes if list was non-empty.
maxElem = getMaxCountElement ()
if maxElem is not NULL:
# Find the first one, whil exists, print and find the next one.
ptr = getNextMatching (first, maxElem->count)
while ptr is not NULL:
print ptr->number
ptr = getNextMatching (ptr->next, maxElem->count)
If the range of numbers is known in advance, and is a reasonable number, you can allocate a sufficiently large array for the counters and just do count[i] += 1.
If the range of numbers is not known in advance, or is too large for the naive use of an array, you could instead maintain a binary tree of values to maintain your counters. This will give you far less searching than a linked list would. Either way you'd have to traverse the array or tree and build an ordering of highest to lowest counts. Again I'd recommend a tree for that, but your list solution could work as well.
Another interesting option could be the use of a priority queue for your extraction phase. Once you have your list of counters completed, walk your tree and insert each value at a priority equal to its count. Then you just pull values from the priority queue until the count goes down.
I would go for a simple hash table based solution.
A structure for hash table containing a number and corresponding frequency. Plus a pointer to the next element for chaining in the hash bucket.
struct ItemFreq {
struct ItemFreq * next_;
int number_;
int frequency_;
};
The processing starts with
max_freq_so_far = 0;
It goes through the list of numbers. For each number, the hash table is looked up for a ItemFreq element x such that x.number_ == number.
If no such x is found, then a ItemFreq element is created as { number_ = number, frequency_ = 1} and inserted into the hash table.
If some x was found then its frequency_ is incremented.
If frequency_ > max_freq_so_far then max_freq_so_far = frequency
Once traversing through the list of numbers of complete, we traverse through the hash table and print the ItemFreq items whose frequency_ == max_freq_so_far
The complexity of the algorithm is O(N) where N is the number of items in the input list.
For a simple and elegant construction of hash table, see section 6.6 of K&R (The C Programming Language).
This response is a sample for the idea of Paul Kuliniewicz:
int CompInt(const void* ptr1, const void* ptr2) {
const int a = *(int*)ptr1;
const int b = *(int*)ptr2;
if (a < b) return -1;
if (a > b) return +1;
return 0;
}
// This function leave the modes in output and return the number
// of modes in output. The output pointer should be available to
// hold at least n integers.
int GetModes(const int* v, int n, int* output) {
// Sort the data and initialize the best result.
qsort(v, v + n, CompInt);
int outputSize = 0;
// Loop through elements while there are not exhausted.
// (look there is no ++i after each iteration).
for (int i = 0; i < n;) {
// This is the begin of the new group.
const int begin = i;
// Move the pointer until there are no more equal elements.
for (; i < n && v[i] == v[begin]; ++i);
// This is one-past the last element in the current group.
const int end = i;
// Update the best mode found until now.
if (end - begin > best) {
best = end - begin;
outputSize = 0;
}
if (end - begin == best)
output[outputSize++] = v[begin];
}
return outputSize;
}

Finding the intersecting node from two intersecting linked lists

Suppose there are two singly linked lists both of which intersect at some point and become a single linked list.
The head or start pointers of both the lists are known, but the intersecting node is not known. Also, the number of nodes in each of the list before they intersect are unknown and both list may have it different i.e. List1 may have n nodes before it reaches intersection point and List2 might have m nodes before it reaches intersection point where m and n may be
m = n,
m < n or
m > n
One known or easy solution is to compare every node pointer in the first list with every other node pointer in the second list by which the matching node pointers will lead us to the intersecting node. But, the time complexity in this case will O(n2) which will be high.
What is the most efficient way of finding the intersecting node?
This takes O(M+N) time and O(1) space, where M and N are the total length of the linked lists. Maybe inefficient if the common part is very long (i.e. M,N >> m,n)
Traverse the two linked list to find M and N.
Get back to the heads, then traverse |M − N| nodes on the longer list.
Now walk in lock step and compare the nodes until you found the common ones.
Edit: See more here.
If possible, you could add a 'color' field or similar to the nodes. Iterate over one of the lists, coloring the nodes as you go. Then iterate over the second list. As soon as you reach a node that is already colored, you have found the intersection.
Dump the contents (or address) of both lists into one hash table. first collision is your intersection.
Check last nodes of each list, If there is an intersection their last node will be same.
This is crazy solution I found while coding late at night, it is 2x slower than accepted answer but uses a nice arithmetic hack:
public ListNode findIntersection(ListNode a, ListNode b) {
if (a == null || b == null)
return null;
int A = a.count();
int B = b.count();
ListNode reversedB = b.reverse();
// L = a elements + 1 c element + b elements
int L = a.count();
// restore b
reversedB.reverse();
// A = a + c
// B = b + c
// L = a + b + 1
int cIndex = ((A+B) - (L-1)) / 2;
return a.atIndex(A - cIndex);
}
We split lists at three parts: a this is part of the first list until start of the common part, b this is part of the second list until common part and c which is common part of two lists. We count list sizes then reverse list b, this will cause that when we start traversing list from a end we will end at reversedB (we will go a -> firstElementOfC -> reversedB). This will give us three equations that allow us to get length of common part c.
This is too slow for programming competitions or use in production, but I think this approach is interesting.
Maybe irrelevant at this point, but here's my dirty recursive approach.
This takes O(M) time and O(M) space, where M >= N for list_M of length M and list_N of length N
Recursively iterate to the end of both lists, then count from the end for step 2. Note that list_N will hit null before list_M, for M > N
Same lengths M=N intersects when list_M != list_N && list_M.next == list_N.next
Different lengths M>N intersects when list_N != null
Code Example:
Node yListsHelper(Node n1, Node n2, Node result) {
if (n1 == null && n2 == null)
return null;
yLists(n1 == null ? n1 : n1.next, n2 == null ? n2 : n2.next, result);
if (n1 != null && n2 != null) {
if (n2.next == null) { // n1 > n2
result.next = n1;
} else if (n1.next == null) { // n1 < n2
result.next = n2;
} else if (n1 != n2 && n1.next == n2.next) { // n1 = n2
result.next = n1.next; // or n2.next
}
}
return result.next;
}

Resources