C get mode from list of integers - c

I need to write a program to find the mode. Or the most occurrence of an integer or integers.
So,
1,2,3,4,1,10,4,23,12,4,1 would have mode of 1 and 4.
I'm not really sure what kind of algorithm i should use. I'm having a hard time trying to think of something that would work.
I was thinking of a frequency table of some sort maybe where i could go through array and then go through and create a linked list maybe. If the linked doesn't contain that value add it to the linked, if it does then add 1 to the value.
So if i had the same thing from above. loop through
1,2,3,4,1,10,4,23,12,4,1
Then list is empty so add node with number = 1 and value = 1.
2 doesnt exist so add node with number = 2 and value = 1 and so on.
Get to the 1 and 1 already exists so value = 2 now.
I would have to loop through the array and then loop through linked list everytime to find that value.
Once i am done then go through the linked list and create a new linked list that will hold the modes. So i set the head to the first element which is 1. Then i go through the linked list that contains the occurences and compare the values. If the occurences of the current node is > the current highest then i set the head to this node. If its = to the highest then i add the node to the mode linked list.
Once i am done i loop through the mode list and print the values.
Not sure if this would work. Does anyone see anything wrong with this? Is there an easier way to do this? I was thinking a hash table too, but not really sure how to do that in C.
Thanks.

If you can keep the entire list of integers in memory, you could sort the list first, which will make repeated values adjacent to each other. Then you can do a single pass over the sorted list to look for the mode. That way, you only need to keep track of the best candidate(s) for the mode seen up until now, along with how many times the current value has been seen so far.

The algorithm you have is fine for a homework assignment. There are all sorts of things you could do to optimise the code, such as:
use a binary tree for efficiency,
use an array of counts where the index is the number (assuming the number range is limited).
But I think you'll find they're not necessary in this case. For homework, the intent is just to show that you understand how to program, not that you know all sorts of tricks for wringing out the last ounce of performance. Your educator will be looking far more for readable, structured, code than tricky optimisations.
I'll describe below what I'd do. You're obviously free to use my advice as much or as little as you wish, depending on how much satisfaction you want to gain at doing it yourself. I'll provide pseudo-code only, which is my standard practice for homework questions.
I would start with a structure holding a number, a count and next pointer (for your linked list) and the global pointer to the first one:
typedef struct sElement {
int number;
int count;
struct sElement *next;
} tElement;
tElement first = NULL;
Then create some functions for creating and using the list:
tElement *incrementElement (int number);
tElement *getMaxCountElement (void);
tElement *getNextMatching (tElement *ptr, int count);
Those functions will, respectively:
Increment the count for an element (or create it and set count to 1).
Scan all the elements returning the maximum count.
Get the next element pointer matching the count, starting at a given point, or NULL if no more.
The pseudo-code for each:
def incrementElement (number):
# Find matching number in list or NULL.
set ptr to first
while ptr is not NULL:
if ptr->number is equal to number:
return ptr
set ptr to ptr->next
# If not found, add one at start with zero count.
if ptr is NULL:
set ptr to newly allocated element
set ptr->number to number
set ptr->count to 0
set ptr->next to first
set first to ptr
# Increment count.
set ptr->count to ptr->count + 1
def getMaxCountElement (number):
# List empty, no mode.
if first is NULL:
return NULL
# Assume first element is mode to start with.
set retptr to first
# Process all other elements.
set ptr to first->next
while ptr is not NULL:
# Save new mode if you find one.
if ptr->count is greater than retptr->count:
set retptr to ptr
set ptr to ptr->next
# Return actual mode element pointer.
return retptr
def getNextMatching (ptr, number):
# Process all elements.
while ptr is not NULL:
# If match on count, return it.
if ptr->number is equal to number:
return ptr
set ptr to ptr->next
# Went through whole list with no match, return NULL.
return NULL
Then your main program becomes:
# Process all the numbers, adding to (or incrementing in) list .
for each n in numbers to process:
incrementElement (n)
# Get the mode quantity, only look for modes if list was non-empty.
maxElem = getMaxCountElement ()
if maxElem is not NULL:
# Find the first one, whil exists, print and find the next one.
ptr = getNextMatching (first, maxElem->count)
while ptr is not NULL:
print ptr->number
ptr = getNextMatching (ptr->next, maxElem->count)

If the range of numbers is known in advance, and is a reasonable number, you can allocate a sufficiently large array for the counters and just do count[i] += 1.
If the range of numbers is not known in advance, or is too large for the naive use of an array, you could instead maintain a binary tree of values to maintain your counters. This will give you far less searching than a linked list would. Either way you'd have to traverse the array or tree and build an ordering of highest to lowest counts. Again I'd recommend a tree for that, but your list solution could work as well.
Another interesting option could be the use of a priority queue for your extraction phase. Once you have your list of counters completed, walk your tree and insert each value at a priority equal to its count. Then you just pull values from the priority queue until the count goes down.

I would go for a simple hash table based solution.
A structure for hash table containing a number and corresponding frequency. Plus a pointer to the next element for chaining in the hash bucket.
struct ItemFreq {
struct ItemFreq * next_;
int number_;
int frequency_;
};
The processing starts with
max_freq_so_far = 0;
It goes through the list of numbers. For each number, the hash table is looked up for a ItemFreq element x such that x.number_ == number.
If no such x is found, then a ItemFreq element is created as { number_ = number, frequency_ = 1} and inserted into the hash table.
If some x was found then its frequency_ is incremented.
If frequency_ > max_freq_so_far then max_freq_so_far = frequency
Once traversing through the list of numbers of complete, we traverse through the hash table and print the ItemFreq items whose frequency_ == max_freq_so_far
The complexity of the algorithm is O(N) where N is the number of items in the input list.
For a simple and elegant construction of hash table, see section 6.6 of K&R (The C Programming Language).

This response is a sample for the idea of Paul Kuliniewicz:
int CompInt(const void* ptr1, const void* ptr2) {
const int a = *(int*)ptr1;
const int b = *(int*)ptr2;
if (a < b) return -1;
if (a > b) return +1;
return 0;
}
// This function leave the modes in output and return the number
// of modes in output. The output pointer should be available to
// hold at least n integers.
int GetModes(const int* v, int n, int* output) {
// Sort the data and initialize the best result.
qsort(v, v + n, CompInt);
int outputSize = 0;
// Loop through elements while there are not exhausted.
// (look there is no ++i after each iteration).
for (int i = 0; i < n;) {
// This is the begin of the new group.
const int begin = i;
// Move the pointer until there are no more equal elements.
for (; i < n && v[i] == v[begin]; ++i);
// This is one-past the last element in the current group.
const int end = i;
// Update the best mode found until now.
if (end - begin > best) {
best = end - begin;
outputSize = 0;
}
if (end - begin == best)
output[outputSize++] = v[begin];
}
return outputSize;
}

Related

Deduplication optimization

The problem is as follows. I want a function that, given a list and a max number of occurrences "x", deletes all elements of the list that appear more than x times or x times.
I found a pretty straightforward solution, which is to check for each of the elements. This said, to repeat the find and delete functions many times seems computationally-wise not optimal to me.
I was wondering whether you could provide a better algorithm (i excluded allocating memory for a matrix from the min to the max... just too much for the task... say you have few very big numbers and your memory won't do it.)
My code follows.
typedef struct n_s
{
int val;
struct n_s *next;
}
n_t;
// deletes all elements equal to del in list with head h
n_t * delete(n_t *h, int del);
// returns the first occurrence of find in list with head h, otherwise gives NULL
n_t * find(n_t *h, int find);
n_t *
delFromList(n_t *h, int x)
{
int val;
n_t *el, *posInter;
// empty list case
if (h == NULL)
return NULL;
// first element
val=h->val;
if ( (posInter = find(h -> next,val))
&& (find(posInter -> next, val)))
h = delete(h, val);
// loop from second element
el = h;
while (el -> next)
{
val = el -> next -> val;
// check whether you want to delete the next one,
// and then if you do so, check again on the "new" next one
if ((posInter = find(el -> next -> next, val))
&& (find(posInter -> next, val)))
el -> next = delete(el -> next, val);
// in case you did not delete the nexy node, you can move on
else
el = el -> next;
}
return h;
}
I know that the el->next->next may look confusing, but I find it less intuitive to use variables such as "next", "past"... so, sorry for your headache.
One option for an algorithm with improved performance is:
Define a data structure D with two members, one for the value of a list element and one to count the number of times it appears.
Initialize an empty balanced tree ordered by value.
Iterate through the list. For each item in the list, look it up in the tree. If it is not present, insert a D structure into that tree with its value member copied from the list element and its count set to one. If it is present in the tree, increments its count. If its count equals or exceeds the threshold, remove it from the list.
Lookups and insertions in a balanced tree are O(log n). A linked list of n items uses n of them, and deletions from a linked list are O(1). So the total time is O(n log n).
Use a counting map to count the number of times each element appears. The keys are the elements, and the values are the counts.
Then, go through your array a second time, deleting anything which meets your threshold.
O(n) time, O(n) extra space.

removing set interval of struct

im having trouble "removing" my struct/array. Right now i can define max array to be size 10. I can fill the array with struct containing name, age, ect. My search function will let me search between a set of interval, say age 10 to 25. What i want my remove function do is remove those all those people between age 10-25. I should be able to re-enter new people into the database as long as it doesn't exceed my defined limit. Right now it seems to randomly remove stuff from the array.
struct database
{
float age,b,c,d;
char name[WORDLENGTH];
};
typedef struct database Database;
search func();
.........
void remove(Database inv[], int *np, int *min, int *max, int *option)
{
int i;
if (*np == 0)
{
printf("The database is empty\n");
return;
}
search(inv, *np, low, high, option);
if (*option == 1)
{
for (i = 0; i<*np; i++)
{
if (inv[i].age >= *low && inv[i].age <= *high)
{
(*np)--;
}
}
}
}
Right now it seems to randomly remove stuff from the array.
The items that your code removes are not random at all. This line
(*np)--;
removes the last item. Therefore, if the range contains two items that match the search condition at the beginning of the inv, your code would remove two items off the end. Things get a little more complicated if matching items are located in the back of the valid range of inv, so deletions start looking random.
Deleting from an array of structs is not different from deleting from an array of ints. You need to follow this algorithm:
Maintain a read index and a write index, initially set to zero
Run a loop that terminates when the read index goes past the end
At each step check the item at read index
If the item does not match the removal condition, copy from read index to write index, and advance both indexes
Otherwise, advance only the read index
Set new np to the value of write index at the end of the loop.
This algorithm ensures that items behind the deleted ones get moved toward the front of the array. See this answer for an example implementation of the above approach.
You can't remove an array element simply by decreasing the count of number of elements.
If you want to remove the n'th element in the array, you have to overwrite the n'th element with the (n+1)'th element and overwrite the (n+1)'th element with the (n+2)'th element and so on.
Something like:
int arr[5] = { 1, 2, 3, 4, 5};
int np = 5;
// Remove element 3 (aka index 2)
int i;
for (i = 2; i < (np-1); ++i)
{
arr[i] = arr[i+1];
}
--np;
This is a simple approach to explain the concept. But notice that it requires a lot of copy so in real code, you should use a better algorithm (if performance is an issue). The answer from #dasblinkenlight explains one good algorithm.

Multiplying pointer address

So basically bst in this case is a float pointer (I am using it for an array implementation of a binary search tree). I need to insert elements, left node = 2i, right node = 2i+1, so bst=bst*2 means I am trying to change the pointer of the current index of the array. It doesn't work. How do I make it work? Addition of pointers is allowed yet not multiplicatoin. Why?!
For more information, see my whole program.
Multiplication on pointers isn't allowed because there's no use for it. If you have a variable that's located at, say, address 0x4f2c18e9 — that's a pointer value — there's no reason to expect that address 0x9e5831d2 (that's 0x4f2c18e9 * 2) is a valid place to store data.
the bst + 1 means "the next float after this float", the compiler knows bst is a float, so it knows how many distance should move to get the next element.
But what is a "Multiply to pointer"? does it has a meaning?
If you means the two times far from the beginning of an array, then it's a index concept. As an array index, like arr[i * 2 + 1], it first calculate the i * 2 + 1 which means the distance, then apply it to the [] operator.
So, if you want to do something like this, you should first calculate the distance from this element to the beginning, then do your multiplying, then add it back in the final. An sample code is as follow:
BStree bst, h;
......
size_t i = bst - h;
BStree pLeftNode = bst + i * 2;
What you're doing is using recursion to keep the base on the bottom of the stack and pushing on the next location using your BStree. You just pass on the next location using a simple test:
void bstree_insert(BStree bst, float key){
long int ibst = (int)bst;
ibst *= 2;
float * nbst = (float *)ibst;
if(*bst == -1)
*bst = key;
else if(*bst == key)
;
else if(*bst > key)
bstree_insert(nbst+1, key);
else if(*bst < key)
bstree_insert(nbst+2, key);
return;
}
From the way you've described it, this is what you want.
HOWEVER!
I don't think your instructor would want you messing with memory like this. It's not something any level of programmer should be doing unless you want your computer to turn into a transformer, change into Optimus Prime, and kick your tush into next year.
So...I would suggest that you go back and carefully re-read the instructions for your problem.

find element in the middle of a stack

I was asked this question in an interview.The problem was i would be given a stack and have to find the element in the middle position of the stack."top" index is not available (so that you don't pop() top/2 times and return the answer).Assume that you will reach the bottom of the stack when pop() returns -1.Don't use any additional data structure.
Eg:
stack index
-----
2 nth element
3
99
.
1 n/2 th element
.
-1 bottom of the stack(0th index)
Answer: 1 (I didn't mean the median.Find the element in middle position)
Is recursion the only way?
Thanks,
psy
Walk through the stack, calculate the depth and on the way back return the appropriate element.
int middle(stack* s, int n, int* depth) {
if (stack_empty(s)) {
*depth = n;
return 0; //return something, doesn't matter..
}
int val = stack_pop(s);
int res = middle(s, n+1, depth);
stack_push(s, val);
if (n == *depth/2)
return val;
return res;
}
int depth;
middle(&stack, 0, &depth);
Note: yes, recursion is the only way. Not knowing the depth of the stack means you have to store those values somewhere.
Recursion is never the only way ;)
However, recursion provides you with an implied additional stack (i.e. function parameters and local variables), and it does appear that you need some additional storage to store traversed elements, in that case it appears that recursion may be the only way given that constraint.
"... Don't use any additional data structure. ..."
Then the task is unsolvable, because you need some place where to store the popped-out data. You need another stack for recursion, which is also a data structure. It doesn't make sense to prohibit any data structure and allow recursion.
Here is one solution: Take two pointers, advance one of them two steps at a time (fast), the other one only one step at a time (slow). If the fast one reaches the bottom return the slow pointer which points to the middle index. No recursion required.
int * slow = stack;
int * fast = stack;
while(1) {
if(STACK_BOTTOM(fast)) return slow;
fast--;
if(STACK_BOTTOM(fast)) return slow;
slow--;
fast--;
}
Recursion seems to be the only way. If you try to use the fast and slow pointer concept during popping, you will need to store the values somewhere and that violates the requirement of no additional data structure.
This question is tagged with c, so for the c programming language I agree that recursion is the only way. However, if first class anonymous functions are supported, you can solve it without recursion. Some pseudo code (using Haskell's lambda syntax):
n = 0
f = \x -> 0 # constant function that always returns 0
while (not stack_empty) do
x = pop
n = n+1
f = \a -> if (a == n) then x else f(a)
middle = f(n/2) # middle of stack
# stack is empty, rebuilt it up to middle if required
for x in (n .. n/2) do push(f(x))
Please note: during the while loop, there's no (recursive) call of f. f(a) in the else branch is just used to construct a new(!) function, which is called f again.
Assumed the stack has 3 elements 10, 20, 30 (from bottom to top) this basically constructs the lambda
(\a -> if a==1
then 30
else (\b -> if b==2
then 20
else (\c -> if c==3
then 10
else (\d -> 0)(c)
)
(b)
)
(a)
)
or a little bit more readable
f(x) = if x==1 then 30 else (if x==2 then 20 else (if x==3 then 10 else 0))

adding two linked lists efficiently in C

I have two linked lists representing the digits of decimal numbers in order from most- to least-significant. for eg 4->7->9->6 and 5->7
The answer should be 4->8->5->3 without reversing the lists because reversing the lists would result in decrease of efficiency.
I am thinking of solving the problem using stack.I will traverse both the lists and push the data elements into two separate stacks.One for each linked list.Then I pop both the stacks together and add both the elements and if the result is a two digit no I 10 modulo it and store the carry in a temp variable.The remainder is stored in the node and the carry is added to the next sum and so on.
if the two stacks are s1 and s2 and the result linked list is res.
temp = 0;
res = (node*)(malloc(sizeof(node*));
while(s1->top!=-1 || s2->top!=-1)
{
temp = 0;
sum = pop(s1) + pop(s2);
n1 = (node*)(malloc(sizeof(node*));
temp = sum/10;
sum = sum%10;
sum = sum+temp;
n1->data = sum;
n1->next = res;
res = n1;
free n1;
//temp=0;
}
if((s1->top==-1)&&(s2->top==-1))
{
return res;
}
else if(s1->top==-1)
{
while(s2->top!=-1)
{
temp = 0;
sum = pop(s2);
sum = sum + temp;
temp = sum/10;
sum = sum%10;
n1 = (node*)(malloc(sizeof(node*));
n1->data = sum;
n1->next = res;
res = n1;
free n1;
}
}
else
{
while(s2->top!=-1)
{
temp = 0;
sum = pop(s2);
sum = sum+temp;
temp = sum/10;
sum = sum%10;
n1=(node*)(malloc(sizeof(node*));
n1->data = sum;
n1->next = res;
res = n1;
free n1;
}
}
return res;
I have come across this problem many times in interview questions but this is the best solution that I could think of.
If anyone can come with something more efficient in c i will be very glad.
Two passes, no stack:
Get the length of the two lists.
Create a solution list with one node. Initialize the value of this node to zero. This will hold the carry digit. Set a list pointer (call it the carry pointer) to the location of this node. Set a list pointer (call it the end pointer) to the location of this node.
Starting with the longer list, for each excess node, link a new node to the end pointer and assign it the value of the excess node. Set the end pointer to this new node. If the
value is less than 9, set the carry pointer to the new node.
Now we're left with both list pointers having the same number of nodes in each.
While the lists are not empty...
Link a new node to the end pointer and advance the end pointer to this node.
Get the values from each list and advance each list pointer to the next node.
Add the two values together.
If value is greater than nine, set the value to value mod 10, increment the value held in the carry pointer's node, move the carry pointer to the next node. If carry pointer's value is nine, set to zero and go to next node.
If value is nine. Set it. Do nothing else.
If value is less than nine. Set it. Set carry pointer to current node.
When you're done with both lists, check if the solution pointer's node value is zero. If it is, set the solution pointer to the next node, deleting the unneeded extra digit.
This is how I would go about solving this:
Step 1: Make a pass on both linked lists, find lengths
say len(L1) = m and len(L2) = n
Step 2: Find difference of lengths
if ( m > n )
d = m - n
else if ( n > m )
d = n - m
else
d = 0
Step 3: Move a temporary pointer d ahead of the larger list
Step 4: Now we have two linked lists to add whose lengths are same, so add them recursively, maintaining a carry.
Step 5:
( Note: if ( d == 0 ) don't perform this step )
After step 4, we've got partial output list, and now we have to put remaining of the larger list at the beginning of output list.
if ( d > 0 )
-Travel larger list till d positions recursively
-Append sum = value_at_end + carry (update carry if sum >= 10) to output list at beginning
-Repeat until difference is consumed
Note: I'm solving the problem as its put before me, not by suggesting the change in underlying data structure.
Time complexity:
Making single passes on both the lists to find their lengths: O(m+n)
Summing two linked lists of equal size (m - d and n) recursively: O(n), assuming m > n
Appending remaining of larger list to output list: O(d)
Total: O( (m+n) + (n) + (d) ) OR O(m+n)
Space complexity:
step 2 of time complexity: O(n), run time stack space
step 3 of time complexity: O(d), run time stack space
Total: O(n + d) OR O(n)
I'd just find the total value of each linked list separately, add them together, then transform that number into a new linked list. So convert 4->7->9->6 and 5->7 to integers with the values 4796 and 57, respectively. Add those together to get 4853, then transform that into a linked list containing 4->8->5->3. You can do the transformations with simple math.
Doing it your way would be a lot easier if you changed the way that the numbers are represented in the first place. Make it so the ones digit is always first, followed by the tens digit, followed by hundreds, etc.
EDIT: Since you're apparently using enormous numbers: have you considered making them doubly-linked lists? Then you wouldn't need to reverse it, per se. Just traverse it backwards.
Using a stack is no more efficient than reversing the lists (actually it is reversing the lists). If your stack object is dynamically allocated this is no big deal, but if you create it with call recursion, you'll easily get Stack Overflow of the bad sort. :-)
If you doubly link the lists, you can add the digits and use the backwards links to find out where to put your carried value. http://en.wikipedia.org/wiki/Doubly_linked_list

Resources