mergesort algorithm - c

First let me say that this is hw so I am looking for more advice than an answer. I am to write a program to read an input sequence and then produce an array of links giving the values in ascending order.
The first line of the input file is the length of the sequence (n) and each of the remaining n lines is a non-negative integer. The
first line of the output indicates the subscript of the smallest input value. Each of the remaining output lines is a triple
consisting of a subscript along with the corresponding input sequence and link values.
(The link values are not initialized before the recursive sort begins. Each link will be initialized to -1 when its sequence value is placed in a single element list at bottom of recursion tree)
The output looks something like this:
0 3 5
1 5 2
2 6 3
3 7 -1
4 0 6
5 4 1
6 1 7
7 2 0
Where (I think) the last column is the subscripts, the center is the unsorted array, and the last column is the link values. I have the code already for the mergeSort and understand how it works I am only just confused and how the links get put into place.

I used vector of structures to hold the three values of each line.
The major steps are:
initialize the indexes and read the values from the input
sort the vector by value
determine the links
sort (back) the vector by index
Here is a sketch of the code:
struct Element {
int index;
int value;
int nextIndex; // link
}
Element V[N + 1];
int StartIndex;
V[i].index = i;
V[i].value = read_from_input;
sort(V); // by value
startIndex = V[0].index;
V[i].nextIndex = V[i + 1].index;
V[N].nextIndex = -1;
sort(V); // by index

Related

Picking random indexes into a sorted array

Let's say I have a sorted array of values:
int n=4; // always lower or equal than number of unique values in array
int i[256] = {};
int v = {1 1 2 4 5 5 5 5 5 7 7 9 9 11 11 13}
// EX 1 ^ ^ ^ ^
// EX 2 ^ ^ ^ ^
// EX 3 ^ ^ ^ ^
I would like to generate n random index values i[0] ... i[n-1], so that:
v[i[0]] ... v[i[n-1]] points to a unique number (ie. must not point to 5 twice)
Each number to must be the rightmost of its kind (ie. must point to the last 5)
An index to the final number (13 in this case) should always be included.
What I've tried so far:
Getting the indexes to the last of the unique values
Shuffling the indexes
Pick out the n first indexes
I'm implementing this in C, so the more standard C functions I can rely on and the shorter code, the better. (For example, shuffle is not a standard C function, but if I must, I must.)
Create an array of the last index values
int last[] = { 1, 2, 3, 8, 10, 12, 14 };
Fisher-Yates shuffle the array.
Take the first n-1 elements from the shuffled array.
Add the index to the final number.
Sort the resulting array, if desired.
This algorithm is called reservoir sampling, and can be used whenever you know how big a sample you need but not how many elements you're sampling from. (The name comes from the idea that you always maintain a reservoir of the correct number of samples. When a new value comes in, you mix it into the reservoir, remove a random element, and continue.)
Create the return value array sample of size n.
Start scanning the input array. Each time you find a new value, add its index to the end of sample, until you have n sampled elements.
Continue scanning the array, but now when you find a new value:
a. Choose a random number r in the range [0, i) where i is the number of unique values seen so far.
b. If r is less than n, overwrite element r with the new element.
When you get to the end, sort sample, assuming you need it to be sorted.
To make sure you always have the last element in the sample, run the above algorithm to select a sample of size n-1. Only consider a new element when you have found a bigger one.
The algorithm is linear in the size of v (plus an n log n term for the sort in the last step.) If you already have the list of last indices of each value, there are faster algorithms (but then you would know the size of the universe before you started sampling; reservoir sampling is primarily useful if you don't know that.)
In fact, it is not conceptually different from collecting all the indices and then finding the prefix of a Fisher-Yates shuffle. But it uses O(n) temporary memory instead of enough to store the entire index list, which may be considered a plus.
Here's an untested sample C implementation (which requires you to write the function randrange()):
/* Produces (in `out`) a uniformly distributed sample of maximum size
* `outlen` of the indices of the last occurrences of each unique
* element in `in` with the requirement that the last element must
* be in the sample.
* Requires: `in` must be sorted.
* Returns: the size of the generated sample, while will be `outlen`
* unless there were not enough unique elements.
* Note: `out` is not sorted, except that the last element in the
* generated sample is the last valid index in `in`
*/
size_t sample(int* in, size_t inlen, size_t* out, size_t outlen) {
size_t found = 0;
if (inlen && outlen) {
// The last output is fixed so we need outlen-1 random indices
--outlen;
int prev = in[0];
for (size_t curr = 1; curr < inlen; ++curr) {
if (in[curr] == prev) continue;
// Add curr - 1 to the output
size_t r = randrange(0, ++found);
if (r < outlen) out[r] = curr - 1;
prev = in[curr];
}
// Add the last index to the output
if (found > outlen) found = outlen;
out[found] = inlen - 1;
}
return found;
}

minimum number of actions needed to sort an array

I'm trying to practice solving a problem from Codeforces. It is to sort an array by moving the elements of the array either to the beginning or to the end of the array. At first thought i thought it is longest increasing subsequence but it's not working in some cases. For example if the input is 4,1,2,5,3 the LIS is 3 but the answer for the problem is moving 4 to the end of the array and then 5 to the end of the array which gives us 2. Also i was trying out on the example 1,6,4,5,9,8,7,3,2 in this LIS is 1,4,5,9 but the answer for the problem is 7 moves between 1 and 2. I got to know that i should use greedy approach but couldn't quite relate. Could someone help me in this ?
We can see that, to sort the array, each element is only need to be moved at most one.
So, to minimize the number of movement, we need to find the maximum number of element that is not moved. And those element is the longest continuous sequence , which is the sequence (a0, a1, ... an) with a(i + 1) = ai + 1.
For example,
(4,1,2,5,3), longest continuous sequence is (1,2,3)
(5,2,1,3,4), longest continuous sequence is (2,3,4)
So we have our code:
int[]longest = new int[n + 1];
int result = 0;
for(int i = 0; i < n; i++){
longest[data[i]] = longest[data[i] - 1] + 1;
result = max (longest[data[i]] , result);
}
print "Minimum number of move is " + (n - result)
Explanation:
In the code, I am using an array longest which index ith stores the longest continuous sequence, which ends at value i.
So, we can see that longest[i] = longest[i - 1] + 1.
And the result for the longest continuous sequence is the maximum value stored in longest array.
I had solved this problem on Codeforces during the contest itself. Nice problem.
Think 'longest continuous sub-sequence'. The answer is n-longest continuous sub-sequence.
Example:
Take 1 2 3 7 5 6 4. The longest continuous sub-sequence is 1 2 3 4. Now you can shift the remaining elements in a particular order to get the sorted array always. At least that is how I thought of it intuitively
Here is a snippet of the main code:
int n=in.readInt();
int[] a=new int[n+1];
int[] cnt=new int[n+1];
int max=0;
for(int i=0;i<n;i++)
a[i]=in.readInt();
for(int i=0;i<n;i++)
{
cnt[a[i]]=1+cnt[a[i]-1];
max=Math.max(max,cnt[a[i]]);
}
out.printLine((n-max));
Hope that helps!

Sorting algorithm with 10 elements

I have to make a program that asks the user to input an array of 10 elements with numbers from 0 to 9.
Then the array is sent to the function. This function will sort each number in the array as follows:
I don't know how to reposition the numbers.
Here is my implementation of
Joel Gregory's comment:
This function creates an unedited array, copies original's contents to it, then loops from first index to last. It will be used as the basis of checking for any matches for the indexes.
In each iteration, the function searches the unedited array if any of it's elements matches the index number. If yes, it replaces the initial value to the index value. Else, zero is assigned. And so on until the last index.
void swap(int *original, int max_elements) {
int index = 0, matcher = 0, unedited[max_elements];
// copies original to unedited
memcpy(unedited, original, max_elements * sizeof(int));
for (--index; ++index < max_elements; ) {
// searches the original array for matches with the current index
// loops until it finds a match or it reaches the maximum number of elements
for (matcher = 0; unedited[matcher] != index && ++matcher < max_elements;);
// if there is match, the initial value is changed with it's index value, else zero is the new value.
original[index] = (matcher != max_elements) ? index : 0;
}
}
Output:
Original: 1 1 7 8 2 8 1 6 3 2
Sorted: 0 1 2 3 0 0 6 7 8 0

Adjusting position in array to maintain increasing order

I have undergone one problem in C in logic creation.What i have to do is:
1)I have array a[215] = {0,1,2,3,4,5}.Now i have to add two minimum elements of this array and then position the newly element obtained in the same array such that it will maintain the increasing order of the array(a[],which was already sorted array).
(2)I also have to take care that the two minimum added elements must not participate in sorting and addition again, they must be fixed at their position once if they are already added, but the newly obtained element by addition can participate in addition and sorting again.
eg:
we add two minimum element 0 and 1, 0+1=1, so "1" is the result obtained by addition, now this "1" must be positioned in a[] such that still there should be increasing order.
so :
0 1 1(added here) 2 3 4 5
Now we have to again find the minimum two nodes (please read the comment (2) again to understand well) .We cannot add 0 abnd 1 again because they have already participated in in the addition. so this time we will add 1 and 2(this one is at index three, please don't get confused wwith the one at index two). so we get 1+2=3
0 1 1 2 3 3 4 5 we again positioned 3 to maintain increasing order.
we repeat again: for element at index 4 and 5(because we have already done addition for element at index 0,1 and 2,3) we will get 3+3=6, again position it in a[].
0 1 1 2 3 3 4 5 6 this time 6 is greater then 4 and 5 so it will appear after 5 to maintain increasing order.
At last we will get a[] like this:
a[ ]= [0 1 1 2 3 3 4 5 6 9 15].
so the addition held was between index 0,1 and 2,3 and 4,5 and 6, 7 and 8,9 and at last we have 15 which is last one, so here we stops.
Now coming to how much i have already implemented :
I have implemented this addition part, which do addition on array a[ ] = [0 1 2 3 4 5].
And puts the newly obtained element at last index(which is dataSize in my code, please see data[dataSize++]=newItem).
Each time i call the function PositionAdjustOfNewItem(data,dataSize); giving the array(which also contains the newly obtained element at last index)as first argument and the newly obtained size as second argument.Here is the code below:
for(i=0;i<14;i++)
for(j=1;j<15;j++)
{
// This freq contains the given array (say a[]=[0 1 2 3 4 5] in our case and
// it is inside the struct Array { int freq}; Array data[256]; )
newItem.freq = data[i].freq + data[j].freq;
data[dataSize++]=newItem;
PositionAdjustOfNewItem(data,dataSize); // Logic of this function I am not able to develop yet. Please help me here
i=i+2;
j=j+1;
}
I am not able to implement the logic of function PositionAdjustOfNewItem(), which pass the array data[], which contains all the elements and the newly added element at last index and in second argument i pass the newly obtained size of array after putting the newly obtained element at last index.
Each time when i add two elements i call this PositionAdjustOfNewItem() passing the newly added elements at last and newly obtained size. which is supposed to be sorted by this function PositionAdjustOfNewItem().
This PositionAdjustOfNewItem() have as least complexity as possible.The part above the code was just to make you aware of mechanish i am using to add elements, You have nothing to do there, I need your help only in getting the logic of PositionAdjustOfNewItem().
(Even i already done it with qsort() but complexity is very high). so need any other way?
How about something like this:
NOTE: In your example, you are dealing with an array of some structure which has freq as a field. In my example, I am using simple integer arrays.
#include <stdio.h>
#include <string.h>
int a[] = {0,1,2,3,4,5};
int main(void) {
int i,j;
// Initialize a new array big enough to hold the result.
int* array = new int[15];
memcpy(array, a, 6*sizeof(int));
int length=6;
// Loop over consecutive indeces.
for (i=0; i+1<length; i+=2) {
// Get the sum of these two indeces.
int sum=array[i]+array[i+1];
// Insert the sum in the array, shifting elements where necessary.
for (j=length-1; j>i+1; j--) {
if (sum >= array[j]) {
// Insert here
break;
} else {
// Shift
array[j+1]=array[j];
}
}
array[j+1]=sum;
// We now have one more element in the array
length++;
}
// Display the array.
printf("{ ");
for (j=0; j<length; j++) {
printf("%d ", array[j]);
}
printf("}\n");
}
To insert the sum, what is done is we traverse the array from the end to the front, looking for the spot it belongs. If we encounter a value less then the sum, then we simply insert it after this value. Otherwise (i.e. value is greater than the sum), we need to insert it before. Thus, the value needs to be shifted one position higher, and then we check the previous value. Continue until we find the location.
If you only need the PositionAdjustNewItem method, then this is what it would look like:
void PositionAdjustOfNewItem(int* array, int length) {
int newItem = array[length-1];
for (int j=length-2; j>i+1; j--) {
if (sum >= array[j]) {
// Insert here
break;
} else {
// Shift
array[j+1]=array[j];
}
}
array[j+1]=sum;
}
When you run it, it produces the output you expect.
$ ./a.out
{ 0 1 1 2 3 3 4 5 6 9 15 }

Find pairs that sum to X in an array of integers of size N having element in the range 0 to N-1

It is an interview question. We have an array of integers of size N containing element between 0 to N-1. It may be possible that a number can occur more than two times. The goal is to find pairs that sum to a given number X.
I did it using an auxiliary array having count of elements of primary array and then rearranging primary according auxiliary array so that primary is sorted and then searched for pairs.
But interviewer wanted space complexity constant, so I told him to sort the array but it is nlogn time complexity solution. He wanted O(n) solution.
Is there any method available to do it in O(n) without any extra space?
No, I don't believe so. You either need extra space to be able to "sort" the data in O(n) by assigning to buckets, or you need to sort in-place which will not be O(n).
Of course, there are always tricks if you can make certain assumptions. For example, if N < 64K and your integers are 32 bits wide, you can multiplex the space required for the count array on top of the current array.
In other words, use the lower 16 bits for storing the values in the array and then use the upper 16 bits for your array where you simply store the count of values matching the index.
Let's use a simplified example where N == 8. Hence the array is 8 elements in length and the integers at each element are less than 8, though they're eight bits wide. That means (initially) the top four bits of each element are zero.
0 1 2 3 4 5 6 7 <- index
(0)7 (0)6 (0)2 (0)5 (0)3 (0)3 (0)7 (0)7
The pseudo-code for an O(n) adjustment which stores the count into the upper four bits is:
for idx = 0 to N:
array[array[idx] % 16] += 16 // add 1 to top four bits
By way of example, consider the first index which stores 7. That assignment statement will therefore add 16 to index 7, upping the count of sevens. The modulo operator is to ensure that values which have already been increased only use the lower four bits to specify the array index.
So the array eventually becomes:
0 1 2 3 4 5 6 7 <- index
(0)7 (0)6 (1)2 (2)5 (0)3 (1)3 (1)7 (3)7
Then you have your new array in constant space and you can just use int (array[X] / 16) to get the count of how many X values there were.
But, that's pretty devious and requires certain assumptions as mentioned before. It may well be that level of deviousness the interviewer was looking for, or they may just want to see how a prospective employee handle the Kobayashi Maru of coding :-)
Once you have the counts, it's a simple matter to find pairs that sum to a given X, still in O(N). The basic approach would be to get the cartestian product. For example, again consider that N is 8 and you want pairs that sum to 8. Ignore the lower half of the multiplexed array above (since you're only interested in the counts, you have:
0 1 2 3 4 5 6 7 <- index
(0) (0) (1) (2) (0) (1) (1) (3)
What you basically do is step through the array one by one getting the product of the counts of numbers that sum to 8.
For 0, you would need to add 8 (which doesn't exist).
For 1, you need to add 7. The product of the counts is 0 x 3, so that gives nothing.
For 2, you need to add 6. The product of the counts is 1 x 1, so that gives one occurrence of (2,6).
For 3, you need to add 5. The product of the counts is 2 x 1, so that gives two occurrences of (3,5).
For 4, it's a special case since you can't use the product. In this case it doesn't matter since there are no 4s but, if there was one, that couldn't become a pair. Where the numbers you're pairing are the same, the formula is (assuming there are m of them) 1 + 2 + 3 + ... + m-1. With a bit of mathematical widardry, that turns out to be m(m-1)/2.
Beyond that, you're pairing with values to the left, which you've already done so you stop.
So what you have ended up with from
a b c d e f g h <- identifiers
7 6 2 5 3 3 7 7
is:
(2,6) (3,5) (3,5)
(c,b) (e,d) (f,d) <- identifiers
No other values add up to 8.
The following program illustrates this in operation:
#include <stdio.h>
int arr[] = {3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 4, 4, 4, 4};
#define SZ (sizeof(arr) / sizeof(*arr))
static void dumpArr (char *desc) {
int i;
printf ("%s:\n Indexes:", desc);
for (i = 0; i < SZ; i++) printf (" %2d", i);
printf ("\n Counts :");
for (i = 0; i < SZ; i++) printf (" %2d", arr[i] / 100);
printf ("\n Values :");
for (i = 0; i < SZ; i++) printf (" %2d", arr[i] % 100);
puts ("\n=====\n");
}
That bit above is just for debugging. The actual code to do the bucket sort is below:
int main (void) {
int i, j, find, prod;
dumpArr ("Initial");
// Sort array in O(1) - bucket sort.
for (i = 0; i < SZ; i++) {
arr[arr[i] % 100] += 100;
}
And we finish with the code to do the pairings:
dumpArr ("After bucket sort");
// Now do pairings.
find = 8;
for (i = 0, j = find - i; i <= j; i++, j--) {
if (i == j) {
prod = (arr[i]/100) * (arr[i]/100-1) / 2;
if (prod > 0) {
printf ("(%d,%d) %d time(s)\n", i, j, prod);
}
} else {
if ((j >= 0) && (j < SZ)) {
prod = (arr[i]/100) * (arr[j]/100);
if (prod > 0) {
printf ("(%d,%d) %d time(s)\n", i, j, prod);
}
}
}
}
return 0;
}
The output is:
Initial:
Indexes: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Counts : 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Values : 3 1 4 1 5 9 2 6 5 3 5 8 9 4 4 4 4
=====
After bucket sort:
Indexes: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Counts : 0 2 1 2 5 3 1 0 1 2 0 0 0 0 0 0 0
Values : 3 1 4 1 5 9 2 6 5 3 5 8 9 4 4 4 4
=====
(2,6) 1 time(s)
(3,5) 6 time(s)
(4,4) 10 time(s)
and, if you examine the input digits, you'll find the pairs are correct.
This may be done by converting the input array to the list of counters "in-place" in O(N) time. Of course this assumes input array is not immutable. There is no need for any additional assumptions about unused bits in each array element.
Start with the following pre-processing: try to move each array's element to the position determined by element's value; move element on this position also to the position determined by its value; continue until:
next element is moved to the position from where this cycle was started,
next element cannot be moved because it is already on the position corresponding to its value (in this case put current element to the position from where this cycle was started).
After pre-processing every element either is located at its "proper" position or "points" to its "proper" position. In case we have an unused bit in each element, we could convert each properly positioned element into a counter, initialize it with "1", and allow each "pointing" element to increase appropriate counter. Additional bit allows to distinguish counters from values. The same thing may be done without any additional bits but with less trivial algorithm.
Count how may values in the array are equal to 0 or 1. If there are any such values, reset them to zero and update counters at positions 0 and/or 1. Set k=2 (size of the array's part that has values less than k replaced by counters). Apply the following procedure for k = 2, 4, 8, ...
Find elements at positions k .. 2k-1 which are at their "proper" position, replace them with counters, initial value is "1".
For any element at positions k .. 2k-1 with values 2 .. k-1 update corresponding counter at positions 2 .. k-1 and reset value to zero.
For any element at positions 0 .. 2k-1 with values k .. 2k-1 update corresponding counter at positions k .. 2k-1 and reset value to zero.
All iterations of this procedure together have O(N) time complexity. At the end the input array is completely converted to the array of counters. The only difficulty here is that up to two counters at positions 0 .. 2k-1 may have values greater than k-1. But this could be mitigated by storing two additional indexes for each of them and processing elements at these indexes as counters instead of values.
After an array of counters is produced, we could just multiply pairs of counters (where corresponding pair of indexes sum to X) to get the required counts of pairs.
String sorting is n log n however if you can assume the numbers are bounded (and you can because you're only interested in numbers that sum to a certain value) you can use a Radix sort. Radix sort takes O(kN) time, where "k" is the length of the key. That's a constant in your case, so I think it's fair to say O(N).
Generally I would however solve this using a hash e.g.
http://41j.com/blog/2012/04/find-items-in-an-array-that-sum-to-15/
Though that is of course not a linear time solution.

Resources