Sorting list specific to operations and conditions - c

Two lists of numbers named l_a and l_b.
In the beginning, l_b will be empty and l_a will contain a certain amount of positive or negative numbers.
The objective is to sort l_a with the minimum of operation and the fastest possible.
only using the following operations and nothing else:
• sa
swap the first two elements of l_a (nothing will happen if there aren’t enough elements).
• sb
swap the first two elements of l_b (nothing will happen if there aren’t enough elements).
• sc
sa and sb at the same time.
• pa
take the first element from l_b and move it to the first position on the l_a list (nothing will happen if
l_b is empty).
• pb
take the first element from l_a and move it to the first position on the l_b list (nothing will happen if
l_a is empty).
• ra
rotate l_a toward the beginning, the first element will become the last.
• rb
rotate l_b toward the beginning, the first element will become the last.
• rr
ra and rb at the same time.
• rra
rotate l_a toward the end, the last element will become the first.
• rrb
rotate l_b toward the end, the last element will become the first.
• rrr
rra and rrb at the same time.
I have implemented all the operation above with a circular doubly linked list in C.
Now I just need to sort l_a but I have tried to modify the iterative quick sort algorithm to use it only with these operations and using the median of the list as a pivot
But I just can't do it , I have no idea on how to do it with a programming language(btw I use C).
I found the radix sort by Bit Shifting method and replacing all the numbers in the initial list with their respective ranks before giving the list to the radix sort so that
it transform the negative numbers into positive numbers.
My friend implemented the radix sort with bit shifting and indexing for optimization and circular doubly linked list that I just described above but the problem is that
on a list with 50 000 randoms numbers it take like 180 seconds and more than 200 000 operation witch is way too much because i need to sort it under 40 seconds and with a lot less operation.
So can you recommend me the best sorting algorithm for this situation or help me implement the iterative median pivot quick sort or optimize the radix sort that my friend have implemented.
Thank you.
The pseudo-code that I tried to create for the iterative median pivot quick sort mentionned above(I don't think it works):
WHILE stack A is not sorted
IF stack A contains 3 or less elements
Sort stack A using swap and rotate commands
Mark a partition in stack A
ELSE
Calculate the median of all elements after the last partition in stack A
Push the lower values onto stack B
Maintain the higher values in stack A
END IF
IF stack B contains 3 or less elements
Push all elements from stack B onto stack A
ELSE
Calculate the median of all elements up until the last partition in stack B
Push the higher values onto stack A
Maintain the lower values in stack B
END IF
END WHILE
the repo with the optimized radix sort(friend's code) :
https://github.com/LouisDupraz/Epitech_Push-Swap
my implementation of the doubly circular linked list and it's operation is the same as the one in the github repo.

Related

Given a source and a final array, find the number of hops to generate final from the source in less than quadratic time complexity

Suppose you have an array of integers (for eg. [1 5 3 4 6]). The elements are rearranged according to the following rule. Every element can hop forward (towards left) and slide the elements in those indices over which it hopped. The process starts with element in second index (i.e. 5). It has a choice to hop over element 1 or it can stay in its own position.If it does choose to hop, element 1 slides down to index 2. Let us assume it does choose to hop and our resultant array will then be [5 1 3 4 6]. Element 3 can now hop over 1 or 2 positions and the process repeats. If 3 hops over one position the array will now be [5 3 1 4 6] and if it hops over two positions it will now be [3 5 1 4 6].
It is very easy to show that all possible permutation of the elements is possible in this way. Also any final configuration can be reached by an unique set of occurrences.
The question is, given a final array and a source array, find the total number of hops required to arrive at the final array from the source. A O(N^2) implementation is easy to find, however I believe this can be done in O(N) or O(NlogN). Also if it is not possible to do better than O(N2) it will be great to know.
For example if the final array is [3,5,1,4,6] and the source array [1,5,3,4,6], the answer will be 3.
My O(N2) algorithm is like this: you loop over all the positions of the source array from the end, since we know that is the last element to move. Here it will be 6 and we check its position in the final array. We calculate the number of hops necessary and need to rearrange the final array to put that element in its original position in the source array. The rearranging step goes over all the elements in the array and the process loops over all the elements, hence O(N^2). Using Hashmap or map can help in searching, but the map needs to be updated after every step which makes in O(N^2).
P.S. I am trying to model correlation between two permutations in a Bayesian way and this is a sub-problem of that. Any ideas on modifying the generative process to make the problem simpler is also helpful.
Edit: I have found my answer. This is exactly what Kendall Tau distance does. There is an easy merge sort based algorithm to find this out in O(NlogN).
Consider the target array as an ordering. A target array [2 5 4 1 3] can be seen as [1 2 3 4 5], just by relabeling. You only have to know the mapping to be able to compare elements in constant time. On this instance, to compare 4 and 5 you check: index[4]=2 > index[5]=1 (in the target array) and so 4 > 5 (meaning: 4 must be to the right of 5 at the end).
So what you really have is just a vanilla sorting problem. The ordering is just different from the usual numerical ordering. The only thing that changes is your comparison function. The rest is basically the same. Sorting can be achieved in O(nlgn), or even O(n) (radix sort). That said, you have some additional constraints: you must sort in-place, and you can only swap two adjacent elements.
A strong and simple candidate would be selection sort, which will do just that in O(n^2) time. On each iteration, you identify the "leftiest" remaining element in the "unplaced" portion and swap it until it lands at the end of the "placed" portion. It can improve to O(nlgn) with the use of an appropriate data structure (priority queue for identifying the "leftiest" remaining element in O(lgn) time). Since nlgn is a lower bound for comparison based sorting, I really don't think you can do better than that.
Edit: So you're not interested in the sequence of swaps at all, only the minimum number of swaps required. This is exactly the number of inversions in the array (adapted to your particular needs: "non natural ordering" comparison function, but it doesn't change the maths). See this answer for a proof of that assertion.
One way to find the number of inversions is to adapt the Merge Sort algorithm. Since you have to actually sort the array to compute it, it turns out to be still O(nlgn) time. For an implementation, see this answer or this (again, remember that you'll have to adapt).
From your answer I assume number of hops is total number of swaps of adjacent elements needed to transform original array to final array.
I suggest to use something like insert sort, but without insertion part - data in arrays will not be altered.
You can make queue t for stalled hoppers as balanced binary search tree with counters (number of elements in subtree).
You can add element to t, remove element from t, balance t and find element position in t in O(log C) time, where C is the number of elements in t.
Few words on finding position of element. It consists of binary search with accumulation of skipped left sides (and middle elements +1 if you keep elements on branches) counts.
Few words on balancing/addition/removal. You have to traverse upward from removed/added element/subtree and update counters. Overall number of operations still hold at O(log C) for insert+balance and remove+balance.
Let's t is that (balanced search tree) queue, p is current original array index, q is final array index, original array is a, final array is f.
Now we have 1 loop starting from left side (say, p=0, q=0):
If a[p] == f[q], then original array element hops over the whole queue. Add t.count to the answer, increment p, increment q.
If a[p] != f[q] and f[q] is not in t, then insert a[p] into t and increment p.
If a[p] != f[q] and f[q] is in t, then add f[q]'s position in queue to answer, remove f[q] from t and increment q.
I like the magic that will ensure this process will move p and q to ends of arrays in the same time if arrays are really permutations of one array. Nevertheless you should probably check p and q overflows to detect incorrect data as we have no really faster way to prove data is correct.

Find the median of an unsorted array without sorting [duplicate]

This question already has answers here:
O(n) algorithm to find the median of n² implicit numbers
(3 answers)
Closed 7 years ago.
is there a way to find the Median of an unsorted array:
1- without sorting it.
2- without using the select algorithm, nor the median of medians
I found a lot of other questions similar to mine. But the solutions, most of them, if not all of them, discussed the SelectProblem and the MedianOfMedians
You can certainly find the median of an array without sorting it. What is not easy is doing that efficiently.
For example, you could just iterate over the elements of the array; for each element, count the number of elements less than and equal to it, until you find a value with the correct count. That will be O(n2) time but only O(1) space.
Or you could use a min heap whose size is just over half the size of the array. (That is, if the array has 2k or 2k+1 elements, then the heap should have k+1 elements.) Build the heap using the first array elements, using the standard heap building algorithm (which is O(N)). Then, for each remaining element x, if x is greater than the heap's minimum, replace the min element with x and do a SiftUp operation (which is O(log N)). At the end, the median is either the heap's minimum element (if the original array's size was odd) or is the average of the two smallest elements in the heap. So that's a total of O(n log n) time, and O(n) space if you cannot rearrange array elements. (If you can rearrange array elements, you can do this in-place.)
There is a randomized algorithm able to accomplish this task in O(n) steps (average case scenario), but it does involve sorting some subsets of the array. And, because of its random nature, there is no guarantee it will actually ever finish (though this unfortunate event should happen with vanishing probability).
I will leave the main idea here. For a more detailed description and for the proof of why this algorithm works, check here.
Let A be your array and let n=|A|. Lets assume all elements of A are distinct. The algorithm goes like this:
Randomly select t = n^(3/4) elements from A.
Let T be the "set" of the selected elements.Sort T.
Set pl = T[t/2-sqrt(n)] and pr = T[t/2+sqrt(n)].
Iterate through the elements of A and determine how many elements are less than pl (denoted by l) and how many are greater than pr (denoted by r). If l > n/2 or r > n/2, go back to step 1.
Let M be the set of elements in A in between pl and pr. M can be determined in step 4, just in case we reach step 5. If the size of M is no more than 4t, sort M. Otherwise, go back to step 1.
Return m = M[n/2-l] as the median element.
The main idea behind the algorithm is to obtain two elements (pl and pr) that enclose the median element (i.e. pl < m < pr) such that these two are very close one two each other in the ordered version of the array (and do this without actually sorting the array). With high probability, all the six steps only need to execute once (i.e. you will get pl and pr with these "good" properties from the first and only pass through step 1-5, so no going back to step 1). Once you find two such elements, you can simply sort the elements in between them and find the median element of A.
Step 2 and Step 5 do involve some sorting (which might be against the "rules" you've mysteriously established :p). If sorting a sub-array is on the table, you should use some sorting method that does this in O(slogs) steps, where s is the size of the array you are sorting. Since T and M are significantly smaller than A the sorting steps take "less than" O(n) steps. If it is also against the rules to sort a sub-array, then take into consideration that in both cases the sorting is not really needed. You only need to find a way to determine pl, pr and m, which is just another selection problem (with respective indices). While sorting T and M does accomplish this, you could use any other selection method (perhaps something rici suggested earlier).
A non-destructive routine selip() is described at http://www.aip.de/groups/soe/local/numres/bookfpdf/f8-5.pdf. It makes multiple passes through the data, at each stage making a random choice of items within the current range of values and then counting the number of items to establish the ranks of the random selection.

Applying same function to every element in an array in C

Say I need to find the euclidean distance from one (x,y) coordinate to every coordinate in an array of million coordinates and then select the coordinate with the smallest distance.
At present I loop though the million element array, calculate distance keeping track of the minimum. Is there a way I could do it differently and faster.
Thanks
You can improve your algorithm significantly by using a more complex data structure for instance a k-d tree. Still if what you expect to do is to simply search once for the nearest neighbour, you can not possibly perform better than iterating over all points.
What you can do, though is optimize the function that computes the distance and also(as mentioned in comments) you may omit the square root as comparing the squares of two non-negative integers is just the same as comparing the values.
What I understand from the question is that you wanna find closest pair of point. There is an algorithm Closest pair of points problem to solve this.
Closest Pair of a set of points:
Divide the set into two equal sized parts by the line l, and recursively compute the minimal distance in each part.
Let d be the minimal of the two minimal distances.
Eliminate points that lie farther than d apart from l
Sort the remaining points according to their y-coordinates
Scan the remaining points in the y order and compute the distances of each point to its five neighbors.
If any of these distances is less than d then update d.
The whole of algorithm Closest Pair takes O(logn*nlogn) = O(nlog2n) time.
We can improve on this algorithm slightly by reducing the time it takes to achieve the y-coordinate sorting in Step 4. This is done by asking that the recursive solution computed in Step 1 returns the points in sorted order by their y coordinates. This will yield two sorted lists of points which need only be merged (a linear time operation) in Step 4 in order to yield a complete sorted list. Hence the revised algorithm involves making the following changes:
Step 1: Divide the set into..., and recursively compute the distance in each part, returning the points in each set in sorted order by y-coordinate.
Step 4: Merge the two sorted lists into one sorted list in O(n) time.
Hence the merging process is now dominated by the linear time steps thereby yielding an O(nlogn) algorithm for finding the closest pair of a set of points in the plane.
You could save quite a chunk of time by first checking if both the distance along x and along y are <= than the last distance (squared) you stored. If it's true, then you carry on with calculating the distance (squared). Of course the amount of time you save depends on how the points are distributed.

Quickest way to find 5 largest values in an array of structs

I have an array of structs called struct Test testArray[25].
The Test struct contains a member called int size.
What is the fastest way to get another array of Test structs that contain all from the original excluding the 5 largest, based on the member size? WITHOUT modifying the original array.
NOTE: Amount of items in the array can be much larger, was just using this for testing and the values could be dynamic. Just wanted a slower subset for testing.
I was thinking of making a copy of the original testArray and then sorting that array. Then return an array of Test structs that did not contain the top 5 or bottom 5 (depending on asc or desc).
OR
Iterating through the testArray looking for the largest 5 and then making a copy of the original array excluding the largest 5. This way seems like it would iterate through the array too many times comparing to the array of 5 largest that had been found.
Follow up question:
Here is what i am doing now, let me know what you think?
Considering the number of largest elements i am interested in is going to remain the same, i am iterating through the array and getting the largest element and swapping it to the front of the array. Then i skip the first element and look for the largest after that and swap it into the second index... so on so forth. Until i have the first 5 largest. Then i stop sorting and just copy the sixth index to the end into a new array.
This way, no matter what, i only iterate through the array 5 times. And i do not have to sort the whole thing.
Partial Sorting with a linear time selection algorithm will do this in O(n) time, where sorting would be O(nlogn).
To quote the Partial Sorting page:
The linear-time selection algorithm described above can be used to find the k smallest or the k largest elements in worst-case linear time O(n). To find the k smallest elements, find the kth smallest element using the linear-time median-of-medians selection algorithm. After that, partition the array with the kth smallest element as pivot. The k smallest elements will be the first k elements.
You can find the k largest items in O(n), although making a copy of the array or an array of pointers to each element (smarter) will cost you some time as well, but you have to do that regardless.
If you'd like me to give a complete explanation of the algorithm involved, just comment.
Update:
Regarding your follow up question, which basically suggests iterating over the list five times... that will work. But it iterates over the list more times than you need to. Finding the k largest elements in one pass (using an O(n) selection algorithm) is much better than that. That way you iterate once to make your new array, and once more to do the selection (if you use median-of-medians, you will not need to iterate a third time to remove the five largest items as you can just split the working array into two parts based on where the 5th largest item is), rather than iterating once to make your new array and then an additional five times.
As stated sorting is O(nlogn +5) iterating in O(5n + 5). In the general case finding m largest numbers is O(nlog +m) using the sort algorithm and O(mn +m) in the iteration algoritm. The question of which algorithm is better depends on the values of m and n. For a value of five iterating is better for up to 2 to the 5th numbers I.e. a measly 32. However in terms of operations sorting is more intensive than iterating so it'll be quite a bit more until it is faster.
You can do better theoretically by using a sorted srray of the largest numbers so far and binary search to maintain the order that will give you O(nlogm) but that again depends on the values of n and m.
Maybe an array isn't the best structure for what you want. Specially since you need to sort it every time a new value is added. Maybe a linked list is better, with a sort on insert (which is O(N) on the worst case and O(1) in the best), then just discard the last five elements. Also, you have to consider that just switching a pointer is considerably faster than reallocating the entire array just get another element in there.
Why not an AVL Tree? Traverse time is O(log2N), but you have to consider the time of rebalancing the tree, and if the time spent coding that is worth it.
With usage of min-heap data structure and set heap size to 5, you can traverse the array and insert into heap when the minimum element of heap is less than the element in the array.
getMin takes O(1) time and insertion takes O(log(k)) time where k is the element size of heap (in our case it is 5). So in the worst case we have complexity O(n*log(k)) to find max 5 elements. Another O(n) will take to get the excluded list.

Optimizing a bubble sort in C that will decrease the number of swaps needed to sort a list of numbers in ascending order

To my understanding, the bubble sort (inefficient version) will do as follows:
2 3 8 6 23 14 16 1 123 90 (10 elements)
Compare element [0] and element [1]
If [0] is bigger than [1], then a swap is completed.
If [1] is bigger than [0], then no swap is complete.
Then the bubble sort will move onto comparing elements [1] and [2] and so forth, creating a total of 9 swaps.
However, could there be a way to guarantee that on the first pass, the highest number will be in its proper place at [9], and that on a second pass, the two highest numbers will be in their proper places at [7] and [8]?
Bubble sort is a specific algorithm -- it doesn't really make sense to ask if it can be optimized to have the property you want. It also has O(n^2) complexity, which is why it is rarely used.
There are other sorting algorithms, like selection sort, which will have a property closer to what you want. Selection sort will guarantee that on the i'th pass, the minimum i elements are in the correct positions. However, selection sort is also O(n^2), and should be avoided if you anticipate sorting a decent amount of data.
Like Basile and Jan, I recommend learning a more efficient and standard sorting algorithm, quicksort. Quicksort is very widely used, and is available in the C standard library. Wikipedia's description of the algorithm is relatively concise; a search on Google will also give many animated versions of quicksort, which can be quite helpful for learning the algorithm.
Don't use bubble sort. Consider superior algorithms, such as
Timsort
Merge sort
Quicksort (which happens to be worse than two above in a pessimistic case)
Encountering a question How do I optimise bubble sort my answer is optimise it as long as it isn't Timsort yet.
If you've implemented the bubble sort algorithm correctly, the highest number must always be in the right place at the end of the first pass.
The optimization is simply to go one step less at the end of the second pass:
let n equal top_element - 1
while n is greater than or equal to zero
for i = 0 to n
if element i is greater than element i+1 then swap
subtract 1 from n
Two things can be done to optimize bubble-sort. First, keep track if any swaps occurred during a single pass. If none occurred, then the list must have been completely sorted by that point, so you don't have to make any additional passes. Also, decrease the range of the loop each pass, as one more element should be in the correct position after each pass.

Resources