O(n) linear search of an array for most common items

O(n) linear search of an array for most common items - arrays

Am trying to write pseudo code for an O(n) algorithm that searches a sorted array for the most frequently occurring elements.
Very new to data structures and algorithms and i haven't coded for about 2 and a half years. I have done some reading around the subject and i believe i am grasping the concepts but i am struggling with the above problem.
This is all i have so far, and i am struggling getting the desired result without the second "for" loop, which makes the algorithm an O(n^2) i believe and i am not to sure how i would deal with more than one frequently occurring element.
Any help or direction as to where i can get help would be greatly appreciated.
A=[i];
Elem=0;
Count=0;
For (i=0; j< A[n-1]; j++);
tempElem=A[j];
empCount=0;
for(p=0; p<A[n-1; p++])
If(A[p]==tempElem)
tempCount++:
if(tempCount>Count);
Elem==tempElem:
Count=tempCount;
Print(“The most frequent element of array A is”: Elem “as it appears” Count “times”)

The inner loop is not your friend. :-)
Your loop body should key on just two bits of logic:
Is this element the same as the previous one?
If so, increment the count for the current item (curr_count) and go to the next element.
Otherwise, check curr_count against the best so far. If it's better, then make the previous element and count the new "best" data.
Either way, set the count back to 1 and go to the next element.

Related

efficient way to remove duplicate pairs of integers in C

I've found answers to similar problems, but none of them exactly described my problem.
so on the risk of being down-voted to hell I was wondering if there is a standard method to solve my problem. Further, there's a chance that I'm asking the wrong question. Maybe the problem can be solved more efficiently another way.
So here's some background:
I'm looping through a list of particles. Each particle has a list of it's neighboring particles. Now I need to create a list of unique particle pairs of mutual neightbours.
Each particle can be identified by an integer number.
Should I just build a list of all the pair's including duplicates and use some kind of sort & comparator to eliminate duplicates or should I try to avoid adding duplicates into my list in the first place?
Performance is really important to me. I guess most of the loops may be vectorized and threaded. On average each particle has around 15 neighbours and I expect, that there will be 1e6 particles at most.
I do have some ideas, but I'm not an experienced coder and I don't want to waste 1 week to test every single method by benchmarking different situations just to find out that there's already a standard meyjod for my problem.
Any suggestions?
BTW: I'm using C.
Some pseudo-code
for i in nparticles
particle=particles[i]; //just an array containing the "index" of each particle
//each particle has a neightbor-list
for k in neighlist[i] //looping through all the neighbors
//k represent the index of the neighbor of particle "i"
if the pair (i,k) or (k,i) is not already in the pair-list, add it. otherwise don't

Sorting the elements each iteration is not a good idea since comparison sort is O(n log n) complex.
The next best thing would be to store the items in a search tree, better yet binary search tree, and better yet self equalizing binary search tree, you can find implementations on GitHub.
Even better solution would give an access time of O(1), you can achieve this in 2 different ways one is a simple identity array, where at each slot you would save say a pointer to item if there is on at this id or some flag defining that current id is empty. This is very fast but wasteful. You'll need O(N) memory.
The best solution in my opinion would be to use a set or a has-map. Which are basically the same because sets can be implemented using hash-map.
Here is a github project with c hash-map implementation.
And stack overflow answer to a similar question.

Number of swaps performed by Bubble-Sort on a given array

I have been given an array and I'm asked to find out the number of Swaps required to sort the array using Bubble Sort
Now we know that, we can find the comparisons by n(n-1)/2 but what I need is the number of actual swaps
My first instinct was to use bubble sort and with each swap(), I incremented a Swap variable. But the time complexity of this is a very slow process and I'd like your help to find an optimized way to solve my dilemma
P.S.: I also need to compare whether it is faster to sort it in ascending or descending.... Sorting it twice doubles the time.
Edit:
Sorry if I wan't clear enough. I want to find the swaps without using Bubble Sort at all.

Regard the applied swap() to a[i] and a[i+1] as a bubble-up of a[i].
Now, asking how many swaps are going to happen is the same as asking how many bubble-up operations are going to happen. Well, and how many do we have of those?
Each a[i] will bubble-up for each position j > i, where a[j]<a[i]. In words a[i] will bubble-up for each position to the right, where the elements value at that position is smaller than a[i] itself. A pair of elements satisfying this condition is what is known as an inversion of a[].
So reformulating your question we could ask: What is the total number of inversions in a[]? (a.k.a. What is the inversion number of a[]?)
This is a well known problem and besides some obvious approaches running in O(n^2) a typical approach to this problem is to tweak merge-sort a bit in order to find this number. And since merge-sort runs in O(n*log(n)) you get the same running time to find the inversion number of a[].
Now that you know that you can tweak merge-sort, I suggest you give it a try on your own on how to do it exactly.
Hint: The main question you have to answer is: When positioning a single element during a merge-step of two arrays, how many inversion did I fix? Then simply add up all of those.
In case you still are stuck after giving it some thought, you can have a look at some full blown solutions here:
http://www.geeksforgeeks.org/counting-inversions/
Counting inversions in an array

Is this a good way to speed up array processing?

So, today I woke up with this single idea.
Just supouse you have a long list of things, an array, and you have to check each one of those to find the one that matches what you're looking for. To do this, you could maybe use a for loop. Now, imagine that the one you're looking for is almost at the end of the list but you don't know it. So, in that case, asuming it doesn't matter the order in which you check the elements of the list, it would be more convinient for you to start from the last element rather than the first one just to save some time and memory maybe. But then, what if your element is almost at the beggining?
That's when I thought: what if I could start checking the elements from both ends of the list at the same time?
So, after several tries, I came up with this raw sample code (which is written in js) that, in my opinion, would solve what we were defining above:
fx (var list) {
var len = length(list);
// To save some time as we were saying, we could check first if the array isn't as long as we were expecting
if (len == 0) {
// If it's not, then we just process the only element anyway
/*
...
list[0]
...
*/
return;
} else {
// So, now here's the thing. The number of loops won't be the length of the list but just half of it.
for (var i = 0; i == len/2; i++) {
// And inside each loop we process both the first and last elements and so on until we reach the middle or find the one we're looking, whatever happens first
/*
...
list[i]
list[len]
...
*/
len--;
}
}
return;
};
Anyway, I'm still not totally sure about if this would really speed up the process or make it slower or not making any difference at all. That's why I need your help, guys.
In your own experience, what do you think? Is this really a good way to make this kind of process faster? If it is or it isn't, why? Is there a way to improve it?
Thanks, guys.

Your proposed algorithm is good if you know that the item is likely to be at the beginning or end but not in the middle, bad if it's likely to be in the middle, and merely overcomplicated if it's equally likely to be anywhere in the list.
In general, if you have an unsorted list of n items then you potentially have to check all of them, and that will always take time which is at least proportional to n (this is roughly what the notation “O(n)” means) — there are no ways around this, other than starting with a sorted or partly-sorted list.
In your scheme, the loop runs for only n/2 iterations, but it does about twice as much work in each iteration as an ordinary linear search (from one end to the other) would, so it's roughly equal in total cost.
If you do have a partly-sorted list (that is, you have some information about where the item is more likely to be), then starting with the most likely locations first is a fine strategy. (Assuming you're not frequently looking for items which aren't in the list at all, in which case nothing helps you.)

If you work from both ends, then you'll get the worst performance when the item you're looking for is near the middle. No matter what you do, sequential searching is O(n).
If you want to speed up searching a list, you need to use a better data structure, such as a sorted list, hash table, or B-tree.

How will you delete duplicate odd numbers from a linked list?

Requirements/constraint:
delete only duplicates
keep one copy
list is not initially sorted
How can this be implemented in C?
(An algorithm and/or code would be greatly appreciated!)

If the list is very long and you want reasonable performances and you are OK with allocating an extra log(n) of memory, you can sort in nlog(n) using qsort or merge sort:
http://swiss-knife.blogspot.com/2010/11/sorting.html
Then you can remove duplicates in n (the total is: nlog(n) + n)
If your list is very tiny, you can do like jswolf19 suggest, and you will get: n(n-1)/2 worst.

There are several different ways of detecting/deleting duplicates:
Nested loops
Take the next value in sequence, then scan until the end of the list to see if this value occurs again. This is O(n2) -- although I believe the bounds can be argued lower? -- but the actual performance may be better as only scanning from i to end (not 0 to end) is done and it may terminate early. This does not require extra data aside from a few variables.
(See Christoph's answer as how this could be done just using a traversal of the linked list and destructive "appending" to a new list -- e.g. the nested loops don't have to "feel" like nested loops.)
Sort and filter
Sort the list (mergesort can be modified to work on linked lists) and then detect duplicate values (they will be side-by-side now). With a good sort this is O(n*lg(n)). The sorting phase usually is/can be destructive (e.g. you have "one copy") but it has been modified ;-)
Scan and maintain a look-up
Scan the list and as the list is scanned add the values to a lookup. If the lookup already contains said values then there is a duplicate! This approach can be O(n) if the lookup access is O(1). Generally a "hash/dictionary" or "set" is used as the lookup, but if only a limited range of integrals are used then an array will work just fine (e.g. the index is the value). This requires extra storage but no "extra copy" -- at least in the literal reading.
For small values of n, big-O is pretty much worthless ;-)
Happy coding.

I'd either
mergesort the list followed by a linear scan to remove duplicates
use an insertion-sort based algorithm which already removes duplicates when re-building the list
The former will be faster, the latter is easier to implement from scratch: Just construct a new list by popping off elements from your old list and inserting them into the new one by scanning it until you hit an element of greater value (in which case you insert the element into the list) or equal value (in which case you discard the element).

Well, you can sort the list first and then check for duplicates, or you could do one of the following:
for i from 0 to list.length-1
for j from i+1 to list.length-1
if list[i] == list[j]
//delete one of them
fi
loop
loop

This is probably the most unoptimized piece of crap, but it'll probably work.
Iterate through the list, holding a pointer to the previous object every time you go on to the next one. Inside your iteration loop iterate through it all to check for a duplicate. If there is a duplicate, now back in the main iteration loop, get the next object. Set the previous objects pointer to the next object to the object you just retrieved, then break out of the loop and restart the whole process till there are no duplicates.

You can do this in linear time using a hash table.
You'd want to scan through the list sequentially. Each time you encounter an odd numbered element, look it up in your hash table. If that number is already in the hash table, delete it from the list, if not add it to the hash table and continue.
Basically the idea is that for each element you scan in the list, you are able to check in constant time whether it is a duplicate of a previous element that you've seen. This takes only a single pass through your list and will take at worst a linear amount of memory (worst case is that every element of the list is a unique odd number, thus your hash table is as long as your list).

What type of Sorting algorithm is this?

If this is Bubble sort then what is this?
Do you see the placement of Swap()?

It's a Selection sort

The first list is indeed selection sort. It is in essence the same as the algorithm on the link you've provided. But instead of finding the element that has the minimum value, and swapping it with arr[i] once after the j loop, the first code immediately swaps arr[i] with any value it encounters which is smaller.
In both cases, at the end of the i loop, arr[i] will contain the smallest element in a within the range i+1..SIZE.
There are two differences between the two algorithms: the code you show here performs more than one swap per iteration, and it shuffles the data that is not yet sorted (this is not really important, as they will eventually get sorted). So, basically it is less efficient than the code you've linked.

It's a kind of Selection Sort (as Maciej Hehl already said), but very ineffective. You swap way to many times. The effect is that you swap witch the minimum, but on the way there swap with each other that is smaller than the number you are looking at. That is unnecessary.

Maybe I am missing something subtle, but the first link (lorenzod8n.wordpress.com) shows a bubble sort; the second (cprogramminglanguage.net) a selection sort. It even says so in the text.
Edit: I missed/forgot something subtle: Bubble Sort exchanges adjacent entries; the algorithm in the first link doesn't. Hence, it's not Bubble Sort, though it does has Bubble Sort's excessive swapping behavior.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight