Efficient method to sort an array (The method of sorting must be picking an element from the array and placing it elsewhere in the array) - c

What is the most efficient method I can use to sort an array of integers by selecting one integer at a time and inserting it anywhere in the array? Or, inserting it at the end or the beginning of the array?
I'm looking for an algorithm that can do this in the minimum number of steps, unlike selection sort.

You need to figure out the longest sequence of sorted elements. Then the number of steps is the number of elements that are not in the sequence.
Since the elements that don't belong to the selected sequence are out of order you need to move each of them to the right position. Since each move is a step according to your description, and you already picked the longest sorted sequence, this will give you the least number of steps.

Related

Find first non-consecutive array element using Google Sheets Filter

I have to make this work in a (always sorted) array that results from splitting a delimited string of sorted whole numbers. Concretely, if my cell contains "1,2,3,5" I need a formula to evaluate to 4. A beneficial side-effect would be to find an implementation that would give the last-number+1, if the original array had only consecutive numbers, i.e., applying the formula to "1,2,3,4,5" would evaluate to 6.
My approach has been to generate a new array that is a perfect sequence and compare it with my original array, to find the first element where the two arrays are not equal.
Creating a perfect sequence of the array like this:
=TRANSPOSE(SEQUENCE(COUNT(arr),1,MIN(arr),1))
So all that would be left to do is compare arr with the sequence above to find the first element that differed, something like:
=COUNTA(IFERROR(FILTER(arr;MATCH(arr; transpose(sequence(count(arr),1,min(arr),1))
;0))))
Sadly, what I have above is not correctly "short-circuiting" at the first non equal value of the arrays. Is COUNTIF the way to go?
If my previous step gets me the index of the element instead of the value, then what remains is to get the value at that index:
INDEX( arr, 1, counta(iferror(filter(arr;match(arr1; transpose(sequence(count(arr),1,min(arr),1))
;0)))) )
Is there a more straight-forward way get the first non-consecutive element? A way that does not involve actual ranges in the spreadsheet?
After some thought, I realized that set-subtracting (i.e. set difference) the original array from the generated sequence always gives me the first missing number.
So my final formula to handle this and the case where all numbers are in sequence is this:
IFERROR(INDEX(FILTER(TRANSPOSE(SEQUENCE(COUNT(SPLIT(G7," ")),1,LEFT(G7,4),1)), ISERROR(MATCH(TRANSPOSE(SEQUENCE(COUNT(split(G7," ")),1,LEFT(G7,4),1)),SPLIT(G7, " "),False))),1,1),RIGHT(G7,4)+1) )
I'm sure there is a better, more concise answer, but this does the job.

Find duplicates in an array in linear time

Problem: You are given an array of n+1 integers from range 1..n. At least one number has duplicate. All array values can be same. Print all duplicates in linear time and constant space. Array can't be modified.
The obvious solution would be to create a bit array with default value false, set 1 in bitarray[array[i]] for each element, then check if it's already 1. That requires additional space, so no good. My another thought: reorder the array by hash and check if a current element and the element array[hash % n] are equal. This is also no good since we can't modify the original array. Now I think that it looks like an impossible task. Is there even a solution to this?

Quickest way to find 5 largest values in an array of structs

I have an array of structs called struct Test testArray[25].
The Test struct contains a member called int size.
What is the fastest way to get another array of Test structs that contain all from the original excluding the 5 largest, based on the member size? WITHOUT modifying the original array.
NOTE: Amount of items in the array can be much larger, was just using this for testing and the values could be dynamic. Just wanted a slower subset for testing.
I was thinking of making a copy of the original testArray and then sorting that array. Then return an array of Test structs that did not contain the top 5 or bottom 5 (depending on asc or desc).
OR
Iterating through the testArray looking for the largest 5 and then making a copy of the original array excluding the largest 5. This way seems like it would iterate through the array too many times comparing to the array of 5 largest that had been found.
Follow up question:
Here is what i am doing now, let me know what you think?
Considering the number of largest elements i am interested in is going to remain the same, i am iterating through the array and getting the largest element and swapping it to the front of the array. Then i skip the first element and look for the largest after that and swap it into the second index... so on so forth. Until i have the first 5 largest. Then i stop sorting and just copy the sixth index to the end into a new array.
This way, no matter what, i only iterate through the array 5 times. And i do not have to sort the whole thing.
Partial Sorting with a linear time selection algorithm will do this in O(n) time, where sorting would be O(nlogn).
To quote the Partial Sorting page:
The linear-time selection algorithm described above can be used to find the k smallest or the k largest elements in worst-case linear time O(n). To find the k smallest elements, find the kth smallest element using the linear-time median-of-medians selection algorithm. After that, partition the array with the kth smallest element as pivot. The k smallest elements will be the first k elements.
You can find the k largest items in O(n), although making a copy of the array or an array of pointers to each element (smarter) will cost you some time as well, but you have to do that regardless.
If you'd like me to give a complete explanation of the algorithm involved, just comment.
Update:
Regarding your follow up question, which basically suggests iterating over the list five times... that will work. But it iterates over the list more times than you need to. Finding the k largest elements in one pass (using an O(n) selection algorithm) is much better than that. That way you iterate once to make your new array, and once more to do the selection (if you use median-of-medians, you will not need to iterate a third time to remove the five largest items as you can just split the working array into two parts based on where the 5th largest item is), rather than iterating once to make your new array and then an additional five times.
As stated sorting is O(nlogn +5) iterating in O(5n + 5). In the general case finding m largest numbers is O(nlog +m) using the sort algorithm and O(mn +m) in the iteration algoritm. The question of which algorithm is better depends on the values of m and n. For a value of five iterating is better for up to 2 to the 5th numbers I.e. a measly 32. However in terms of operations sorting is more intensive than iterating so it'll be quite a bit more until it is faster.
You can do better theoretically by using a sorted srray of the largest numbers so far and binary search to maintain the order that will give you O(nlogm) but that again depends on the values of n and m.
Maybe an array isn't the best structure for what you want. Specially since you need to sort it every time a new value is added. Maybe a linked list is better, with a sort on insert (which is O(N) on the worst case and O(1) in the best), then just discard the last five elements. Also, you have to consider that just switching a pointer is considerably faster than reallocating the entire array just get another element in there.
Why not an AVL Tree? Traverse time is O(log2N), but you have to consider the time of rebalancing the tree, and if the time spent coding that is worth it.
With usage of min-heap data structure and set heap size to 5, you can traverse the array and insert into heap when the minimum element of heap is less than the element in the array.
getMin takes O(1) time and insertion takes O(log(k)) time where k is the element size of heap (in our case it is 5). So in the worst case we have complexity O(n*log(k)) to find max 5 elements. Another O(n) will take to get the excluded list.

Way to judge if two Arrays are identical?

By identical I mean two Arrays contain same elements, order of elements in Arrays are not matter here.
The solution I came up with is like this(which turns out a wrong approach as pointed out in comments):
if the size of two Arrays are equal
See True, find all elements of Array A in Array B
All Found, find all elements of Array B in Array A
All Found, then I get conclusion two Arrays are identical
However, is there better algorithm in term of time complexity?
Let's say you have an User[] array 1 and User[] array 2. You can lop through array one and add them to Dictionary<User, int> dictionary where the key is the user and the value is a count. Then you loop through the second array and for each user in array 2 decrement the count in the dictionary (if count is greater than 1) or remove the element (if count is 1). If the user isn't in the dictionary, then you can stop, the arrays don't match.
If you get to the end and had previously checked length of the arrays is same, then the arrays match. If you hadn't checked length earlier (which of course you still should have), then you can just verify the dictionary is now empty after completely looping through array 2.
I don't know exactly what the performance of this is, but it will be faster than sorting both lists and looping through them comparing element by element. Takes more memory though, but if the arrays are not super large then memory usage shouldn't be an issue.
First, check the size of the two arrays. If they aren't equal then they don't contain the same elements.
After that, sort both the arrays in O(n lg(n)). Now, just check both the arrays element-by-element in O(n). As they are sorted, if they are equal then they will be equal in every position.
Your approach doesn't work, as it would treat [0, 1, 1] as being equal to [0, 0, 1]. Every item in A is in B and vice versa. You'd need to count the number of occurrences of each item in A and B. (You then don't need to do both, of course, if you've already checked the lengths.)
If the contents are sortable, you can sort both and then compare element-by-element, of course. That only works if you can provide a total ordering to elements though.
Sort both arrays according to a strict ordering and compare them term-by-term.
Update: Summarizing some of the points that have been raised, here the efficiency you can generally expect:
strict ordering available: O(log N) for sorting plus comparing term-by-term
equality and hash function available: compare hash counts term-by-term, plus actual object comparisons in the event of hash collisions.
only equality, no hashing available: must count each element or copy one container and remove (efficiency depends on the container).
The complexity of comparison term-by-term is linear in the position of the first mismatch.
My idea is to loop through the first array and look for items in the second array. The only issue of course is that you can't use an item in the second array twice. So, make a third array of booleans. This array indicates which items in array 2 'have been used'.
Loop through the first array. Inside that loop through each element in the second array to see if you can 'find' that element in the second array, but also check the third array to verify that the position in the second array hasn't been used. If you find a match update that position in the third array and move on.
You should only need to do this once. If you finish and you found a match for all items in array 2 then no unmatched items remain in array 2. You don't need to then loop through array 2 and see if array 1 contains the item.
Of course before you start all that check that the lengths are the same.
If you don't mind extra space you can do some like HashMap to Store the (element,count) pairs of the first array then check if the second array matches up; this would be linear in N (size of biggest array)
If the array sizes are identical and all of the elements in Array A are in Array B, then there is no need to verify that all of the elements in array B are in Array A. So at the very least you can omit that step.
EDIT: Depends on the definition of the problem. This solution would work if and only if his original solution would work, which it wouldn't if the arrays can have duplicate items and you weren't counting or marking them as "used."

How do I find common elements from n arrays

I am thinking of sorting and then doing binary search. Is that the best way?
I advocate for hashes in such cases: you'll have time proportional to common size of both arrays.
Since most major languages offer hashtable in their standard libraries, I hardly need to show your how to implement such solution.
Iterate through each one and use a hash table to store counts. The key is the value of the integer and the value is the count of appearances.
It depends. If one set is substantially smaller than the other, or for some other reason you expect the intersection to be quite sparse, then a binary search may be justified. Otherwise, it's probably easiest to step through both at once. If the current element in one is smaller than in the other, advance to the next item in that array. When/if you get to equal elements, you send that as output, and advance to the next item in both arrays. (This assumes, that as you advocated, you've already sorted both, of course).
This is an O(N+M) operation, where N is the size of one array, and M the size of the other. Using a binary search, you get O(N lg2 M) instead, which can be lower complexity if one array is lot smaller than the other, but is likely to be a net loss if they're close to the same size.
Depending on what you need/want, the versions that attempt to just count occurrences can cause a pretty substantial problem: if there are multiple occurrences of a single item in one array, they will still count that as two occurrences of that item, indicating an intersection that doesn't really exist. You can prevent this, but doing so renders the job somewhat less trivial -- you insert items from one array into your hash table, but always set the count to 1. When that's finished, you process the second array by setting the count to 2 if and only if the item is already present in the table.
Define "best".
If you want to do it fast, you can do it O(n) by iterating through each array and keeping a count for each unique element. Details of how to count the unique elements depend on the alphabet of things that can be in the array, eg, is it sparse or dense?
Note that this is O(n) in the number of arrays, but O(nm) for arrays of length m).
The best way is probably to hash all the values and keep a count of occurrences, culling all that have not occurred i times when you examine array i where i = {1, 2, ..., n}. Unfortunately, no deterministic algorithm can get you less than an O(n*m) running time, since it's impossible to do this without examining all the values in all the arrays if they're unsorted.
A faster algorithm would need to either have an acceptable level of probability (Monte Carlo), or rely on some known condition of the lists to examine only a subset of elements (i.e. you only care about elements that have occurred in all i-1 previous lists when considering the ith list, but in an unsorted list it's non-trivial to search for elements.

Resources