Improving Mergesort. Improvement 3). Use one less copy between input and temp arrays - c

I am currently working on a project for my algorithms class and am at a bit of a standstill. We were assigned to do improvements to merge sort, that was in the book, by implementing specific changes. I have worked fine through the first 2 changes but the 3'rd one is killer.
Merge sort, the one we are improving, copies the contents of the input array into the temporary array, and then copies the temporary array back into the input array. So it recursively sorts the input array, placing the two sorted halves into the temporary array. And then it merges the two halves in the temporary array together, placing the sorted sequence into the input array as it goes.
The improvement is that this double copying is wasteful can be done without. His hint is that: We can make it so that each call to Merge only copies in one direction, but the calls to Merge alternate the direction.
This is supposedly done by blurring the lines between the original and temporary array.
I am not really looking for code as I am confident that I can code this. I just have no idea what i'm supposed to be doing. The professor is gone for the day so I can't ask him until next week when I have his course again.
Has anyone done something like this before? Or can decipher and put it into laymans terms for me :P
The first improvement, simply has it use insertion sort whenever an Array gets small enough that it will benefit greatly, timewise, from doing so.
The second improvement stops allocating two dynamic arrays (the 2 halves that are sorted) and instead allocates 1 array of size n and that is what is used instead of the two dynamic arrays. That's that last one I did. The code for that is :
//#include "InsertionSort.h"
#define INSERTION_CUTOFF 250
#include <limits.h> // needed for INT_MAX (the sentinel)
void merge3(int* inputArray, int p, int q, int r, int* tempArray)
{
int i,j,k;
for (i = p; i <= r; i++)
{
tempArray[i] = inputArray[i];
}
i = p;
j = q+1;
k = p;
while (i <= q && j <= r)
{
if (tempArray[i] <= tempArray[j])
{
inputArray[k++] = tempArray[i++];
}
else
{
inputArray[k++] = tempArray[j++];
}
}
}//merge3()
void mergeSort3Helper(int* inputArray, int p, int r, int* tempArray)
{
if (r - p < INSERTION_CUTOFF)
{
insertionSort(inputArray,p,r);
return;
}
int q = (p+r-1)/2;
mergeSort3Helper(inputArray,p,q,tempArray);
mergeSort3Helper(inputArray,q+1,r,tempArray);
merge3(inputArray,p,q,r,tempArray);
}//mergeSort3Helper()
void mergeSort3(int* inputArray, int p, int r)
{
if (r-p < 1)
{
return;
}
if (r - p < INSERTION_CUTOFF)
{
insertionSort(inputArray,p,r);
return;
}
int* tempArray = malloc((r-p)+1*sizeof(int));
tempArray[r+1] = INT_MAX;
mergeSort3Helper(inputArray,p,r,tempArray);
// This version of merge sort should allocate all the extra space
// needed for merging just once, at the very beginning, instead of
// within each call to merge3().
}//mergeSort3()

The algorithm is like this:
A1: 7 0 2 9 5 1 4 3
A2: (uninitialized)
Step 1:
A1 : unchanged
A2: 0 7 2 9 1 5 3 4
Step 2:
A1: 0 2 7 9 1 3 4 5
A2: unchanged
Step 3:
A1: unchanged
A2: 0 1 2 3 4 5 7 9
This involves you copying only one way each time and follows the steps of mergesort. As your professor said, you blur the lines between the work array and the sorted array by alternating which is which, and only copying once things are sorted.

I suspect it would be difficult and ultimately unprofitable to avoid all copying. What you want to do instead is to avoid the copy you currently do with each merge.
Your current merge3(inputArray, p,q,r, tempArray) returns the merged result in its original array, which requires a copy; it uses its tempArray buffer only as a resource. In order to do better, you need to modify it to something like merge4(inputArray, p,q,r, outputArray), where the result is returned in the second buffer, not the first.
You will need to change the logic in mergeSort3Helper() to deal with this. One approach requires a comparable interface change, to mergeSort4Helper(inputArray, p,q,r, outputArray), such that it also yields its result in its second buffer. This will require a copy at the lowest (insertion sort) level, and a second copy in the top-level mergeSort4() if you want your final result in the same buffer it came in. However, it eliminates all other unnecessary copies.
Alternately, you could add a boolean parameter to mergeSort4Helper() to indicate whether you want the result returned in the first or second buffer. This value would alternate recursively, resulting in at most one copy, at the lowest level.
A final option might be to do the merging non-recursively, and alternate buffers at each pass. This would also result in at most one copy; however, I would expect the resulting access pattern to be inherently less cache-friendly than the recursive one.

Related

Making a character array rotate its cells left/right n times

I'm totally new here but I heard a lot about this site and now that I've been accepted for a 7 months software development 'bootcamp' I'm sharpening my C knowledge for an upcoming test.
I've been assigned a question on a test that I've passed already, but I did not finish that question and it bothers me quite a lot.
The question was a task to write a program in C that moves a character (char) array's cells by 1 to the left (it doesn't quite matter in which direction for me, but the question specified left). And I also took upon myself NOT to use a temporary array/stack or any other structure to hold the entire array data during execution.
So a 'string' or array of chars containing '0' '1' '2' 'A' 'B' 'C' will become
'1' '2' 'A' 'B' 'C' '0' after using the function once.
Writing this was no problem, I believe I ended up with something similar to:
void ArrayCharMoveLeft(char arr[], int arrsize, int times) {
int i;
for (i = 0; i <= arrsize ; i++) {
ArraySwap2CellsChar(arr, i, i+1);
}
}
As you can see the function is somewhat modular since it allows to input how many times the cells need to move or shift to the left. I did not implement it, but that was the idea.
As far as I know there are 3 ways to make this:
Loop ArrayCharMoveLeft times times. This feels instinctively inefficient.
Use recursion in ArrayCharMoveLeft. This should resemble the first solution, but I'm not 100% sure on how to implement this.
This is the way I'm trying to figure out: No loop within loop, no recursion, no temporary array, the program will know how to move the cells x times to the left/right without any issues.
The problem is that after swapping say N times of cells in the array, the remaining array size - times are sometimes not organized. For example:
Using ArrayCharMoveLeft with 3 as times with our given array mentioned above will yield
ABC021 instead of the expected value of ABC012.
I've run the following function for this:
int i;
char* lastcell;
if (!(times % arrsize))
{
printf("Nothing to move!\n");
return;
}
times = times % arrsize;
// Input checking. in case user inputs multiples of the array size, auto reduce to array size reminder
for (i = 0; i < arrsize-times; i++) {
printf("I = %d ", i);
PrintArray(arr, arrsize);
ArraySwap2CellsChar(arr, i, i+times);
}
As you can see the for runs from 0 to array size - times. If this function is used, say with an array containing 14 chars. Then using times = 5 will make the for run from 0 to 9, so cells 10 - 14 are NOT in order (but the rest are).
The worst thing about this is that the remaining cells always maintain the sequence, but at different position. Meaning instead of 0123 they could be 3012 or 2301... etc.
I've run different arrays on different times values and didn't find a particular pattern such as "if remaining cells = 3 then use ArrayCharMoveLeft on remaining cells with times = 1).
It always seem to be 1 out of 2 options: the remaining cells are in order, or shifted with different values. It seems to be something similar to this:
times shift+direction to allign
1 0
2 0
3 0
4 1R
5 3R
6 5R
7 3R
8 1R
the numbers change with different times and arrays. Anyone got an idea for this?
even if you use recursion or loops within loops, I'd like to hear a possible solution. Only firm rule for this is not to use a temporary array.
Thanks in advance!
If irrespective of efficiency or simplicity for the purpose of studying you want to use only exchanges of two array elements with ArraySwap2CellsChar, you can keep your loop with some adjustment. As you noted, the given for (i = 0; i < arrsize-times; i++) loop leaves the last times elements out of place. In order to correctly place all elements, the loop condition has to be i < arrsize-1 (one less suffices because if every element but the last is correct, the last one must be right, too). Of course when i runs nearly up to arrsize, i+times can't be kept as the other swap index; instead, the correct index j of the element which is to be put at index i has to be computed. This computation turns out somewhat tricky, due to the element having been swapped already from its original place. Here's a modified variant of your loop:
for (i = 0; i < arrsize-1; i++)
{
printf("i = %d ", i);
int j = i+times;
while (arrsize <= j) j %= arrsize, j += (i-j+times-1)/times*times;
printf("j = %d ", j);
PrintArray(arr, arrsize);
ArraySwap2CellsChar(arr, i, j);
}
Use standard library functions memcpy, memmove, etc as they are very optimized for your platform.
Use the correct type for sizes - size_t not int
char *ArrayCharMoveLeft(char *arr, const size_t arrsize, size_t ntimes)
{
ntimes %= arrsize;
if(ntimes)
{
char temp[ntimes];
memcpy(temp, arr, ntimes);
memmove(arr, arr + ntimes, arrsize - ntimes);
memcpy(arr + arrsize - ntimes, temp, ntimes);
}
return arr;
}
But you want it without the temporary array (more memory efficient, very bad performance-wise):
char *ArrayCharMoveLeft(char *arr, size_t arrsize, size_t ntimes)
{
ntimes %= arrsize;
while(ntimes--)
{
char temp = arr[0];
memmove(arr, arr + 1, arrsize - 1);
arr[arrsize -1] = temp;
}
return arr;
}
https://godbolt.org/z/od68dKTWq
https://godbolt.org/z/noah9zdYY
Disclaimer: I'm not sure if it's common to share a full working code here or not, since this is literally my first question asked here, so I'll refrain from doing so assuming the idea is answering specific questions, and not providing an example solution for grabs (which might defeat the purpose of studying and exploring C). This argument is backed by the fact that this specific task is derived from a programing test used by a programing course and it's purpose is to filter out applicants who aren't fit for intense 7 months training in software development. If you still wish to see my code, message me privately.
So, with a great amount of help from #Armali I'm happy to announce the question is answered! Together we came up with a function that takes an array of characters in C (string), and without using any previously written libraries (such as strings.h), or even a temporary array, it rotates all the cells in the array N times to the left.
Example: using ArrayCharMoveLeft() on the following array with N = 5:
Original array: 0123456789ABCDEF
Updated array: 56789ABCDEF01234
As you can see the first cell (0) is now the sixth cell (5), the 2nd cell is the 7th cell and so on. So each cell was moved to the left 5 times. The first 5 cells 'overflow' to the end of the array and now appear as the Last 5 cells, while maintaining their order.
The function works with various array lengths and N values.
This is not any sort of achievement, but rather an attempt to execute the task with as little variables as possible (only 4 ints, besides the char array, also counting the sub function used to swap the cells).
It was achieved using a nested loop so by no means its efficient runtime-wise, just memory wise, while still being self-coded functions, with no external libraries used (except stdio.h).
Refer to Armali's posted solution, it should get you the answer for this question.

Compare two arrays and create new array with equal elements in C

The problem is to check two arrays for the same integer value and put matching values in a new array.
Let say I have two arrays
a[n] = {2,5,2,7,8,4,2}
b[m] = {1,2,6,2,7,9,4,2,5,7,3}
Each array can be a different size.
I need to check if the arrays have matching elements and put them in a new array. The result in this case should be:
array[] = {2,2,2,5,7,4}
And I need to do it in O(n.log(n) + m.log(m)).
I know there is a way to do with merge sorting or put one of the array in a hash array but I really don't know how to implement it.
I will really appreciate your help, thanks!!!
As you have already figured out you can use merge sort (implementing it is beyond the scope of this answer, I suppose you can find a solution on wikipedia or searching on Stack Overflow) so that you can get nlogn + mlogm complexity supposing n is the size of the first array and m is the size of another.
Let's call the first array a (with the size n) and the second one b (with size m). First sort these arrays (merge sort would give us nlogn + mlogm complexity). And now we have:
a[n] // {2,2,2,4,5,7,8} and b[n] // {1,2,2,2,3,4,5,6,7,7,9}
Supposing n <= m we can simply iterate simulateously comparing coresponding values:
But first lets allocate array int c[n]; to store results (you can print to the console instead of storing if you need). And now the loop itself:
int k = 0; // store the new size of c array!
for (int i = 0, j = 0; i < n && j < m; )
{
if (a[i] == b[j])
{
// match found, store it
c[k] = a[i];
++i; ++j; ++k;
}
else if (a[i] > b[j])
{
// current value in a is leading, go to next in b
++j;
}
else
{
// the last possibility is a[i] < b[j] - b is leading
++i;
}
}
Note: the loop itself is n+m complexity at worst (remember n <= m assumption) which is less than for sorting so overal complexity is nlogn + mlogm. Now you can iterate c array (it's size is actually n as we allocated, but the number of elements in it is k) and do what you need with that numbers.
From the way that you explain it the way to do this would be to loop over the shorter array and check it against the longer array. Let us assume that A is the shorter array and B the longer array. Create a results array C.
Loop over each element in A, call it I
If I is found in B, remove it from B and put it in C, break out of the test loop.
Now go to the next element in A.
This means that if a number I is found twice in A and three times in B, then I will only appear twice in C. Once you finish, then every number found in both arrays will appear in C the number of times that it actually appears in both.
I am carefully not putting in suggested code as your question is about a method that you can use. You should figure out the code yourself.
I would be inclined to take the following approach:
1) Sort array B. There are many well published sort algorithms to do this, as well as several implementations in various generally available libraries.
2) Loop through array A and for each element do a binary search (or other suitable algorithm) on array B for a match. If a match is found, remove the element from array B (to avoid future matches) and add it to the output array.

Implementing Radix sort in java - quite a few questions

Although it is not clearly stated in my excercise, I am supposed to implement Radix sort recursively. I've been working on the task for days, but yet, I only managed to produce garbage, unfortunately. We are required to work with two methods. The sort method receives a certain array with numbers ranging from 0 to 999 and the digit we are looking at. We are supposed to generate a two-dimensional matrix here in order to distribute the numbers inside the array. So, for example, 523 is positioned at the fifth row and 27 is positioned at the 0th row since it is interpreted as 027.
I tried to do this with the help of a switch-case-construct, dividing the numbers inside the array by 100, checking for the remainder and then position the number with respect to the remainder. Then, I somehow tried to build buckets that include only the numbers with the same digit, so for example, 237 and 247 would be thrown in the same bucket in the first "round". I tried to do this by taking the whole row of the "fields"-matrix where we put in the values before.
In the putInBucket-method, I am required to extent the bucket (which I managed to do right, I guess) and then returning it.
I am sorry, I know that the code is total garbage, but maybe there's someone out there who understands what I am up to and can help me a little bit.
I simply don't see how I need to work with the buckets here, I even don't understand why I have to extent them, and I don't see any way to returning it back to the sort-method (which, I think, I am required to do).
Further description:
The whole thing is meant to work as follows: We take an array with integers ranging from 0 to 999. Every number is then sorted by its first digit, as mentioned above. Imagine you have buckets denoted with the numbers ranging from 0 to 9. You start the sorting by putting 523 in bucket 5, 672 in bucket 6 and so on. This is easy when there is only one number (or no number at all) in one of the buckets. But it gets harder (and that's where recursion might come in hand) when you want to put more than one number in one bucket. The mechanism now goes as follows: We put two numbers with the same first digit in one bucket, for example 237 and 245. Now, we want to sort these numbers again by the same algorithm, meaning we call the sort-method (somehow) again with an array that only contains these two numbers and sorting them again, but now my we do by looking at the second digit, so we would compare 3 and 4. We sort every number inside the array like this, and at the end, in order to get a sorted array, we start at the end, meaning at bucket 9, and then just put everything together. If we would be at bucket 2, the algorithm would look into the recursive step and already receive the sorted array [237, 245] and deliver it in order to complete the whole thing.
My own problems:
I don't understand why we need to extent a bucket and I can't figure it out from the description. It is simply stated that we are supposed to do so. I'd imagine that we would to it to copy another element inside it, because if we have the buckets from 0 to 9, putting in two numbers inside the same bucket would just mean that we would overwrite the first value. This might be the reason why we need to return the new, extended bucket, but I am not sure about that. Plus, I don't know how to go further from there. Even if I have an extened bucket now, it's not like I can simply stick it to the old matrix and copy another element into it again.
public static int[] sort(int[] array, int digit) {
if (array.length == 0)
return array;
int[][] fields = new int[10][array.length];
int[] bucket = new int[array.length];
int i = 0;
for (int j = 0; j < array.length; j++) {
switch (array[j] / 100) {
case 0: i = 0; break;
case 1: i = 1; break;
...
}
fields[i][j] = array[j]
bucket[i] = fields[i][j];
}
return bucket;
}
private static int[] putInBucket(int [] bucket, int number) {
int[] bucket_new = int[bucket.length+1];
for (int i = 1; i < bucket_new.length; i++) {
bucket_new[i] = bucket[i-1];
}
return bucket_new;
}
public static void main (String [] argv) {
int[] array = readInts("Please type in the numbers: ");
int digit = 0;
int[] bucket = sort(array, digit);
}
You don't use digit in sort, that's quite suspicious
The switch/case looks like a quite convoluted way to write i = array[j] / 100
I'd recommend to read the wikipedia description of radix sort.
The expression to extract a digit from a base 10 number is (number / Math.pow(10, digit)) % 10.
Note that you can count digits from left to right or right to left, make sure you get this right.
I suppose you first want to sort for digit 0, then for digit 1, then for digit 2. So there should be a recursive call at the end of sort that does this.
Your buckets array needs to be 2-dimensional. You'll need to call it this way: buckets[i] = putInBucket(buckets[i], array[j]). If you handle null in putInBuckets, you don't need to initialize it.
The reason why you need a 2d bucket array and putInBucket (instead of your fixed size field) is that you don't know how many numbers will end up in each bucket
The second phase (reading back from the buckets to the array) is missing before the recursive call
make sure to stop the recursion after 3 digits
Good luck

Grid containing apples

I found this question on a programming forum:
A table composed of N*M cells,each having a certain quantity of apples, is given. you start from the upper-left corner. At each step you can go down or right one cell.Design an algorithm to find the maximum number of apples you can collect ,if you are moving from upper-left corner to bottom-right corner.
I have thought of three different complexities[in terms of time & space]:
Approach 1[quickest]:
for(j=1,i=0;j<column;j++)
apple[i][j]=apple[i][j-1]+apple[i][j];
for(i=1,j=0;i<row;i++)
apple[i][j]=apple[i-1][j]+apple[i][j];
for(i=1;i<row;i++)
{
for(j=1;j<column;j++)
{
if(apple[i][j-1]>=apple[i-1][j])
apple[i][j]=apple[i][j]+apple[i][j-1];
else
apple[i][j]=apple[i][j]+apple[i-1][j];
}
}
printf("\n maximum apple u can pick=%d",apple[row-1][column-1]);
Approach 2:
result is the temporary array having all slots initially 0.
int getMax(int i, int j)
{
if( (i<ROW) && (j<COL) )
{
if( result[i][j] != 0 )
return result[i][j];
else
{
int right = getMax(i, j+1);
int down = getMax(i+1, j);
result[i][j] = ( (right>down) ? right : down )+apples[i][j];
return result[i][j];
}
}
else
return 0;
}
Approach 3[least space used]:
It doesn't use any temporary array.
int getMax(int i, int j)
{
if( (i<M) && (j<N) )
{
int right = getMax(i, j+1);
int down = getMax(i+1, j);
return apples[i][j]+(right>down?right:down);
}
else
return 0;
}
I want to know which is the best way to solve this problem?
There's little difference between approaches 1 and 2, approach 1 is probably a wee bit better since it doesn't need the stack for the recursion that approach 2 uses since that goes backwards.
Approach 3 has exponential time complexity, thus it is much worse than the other two which have complexitx O(rows*columns).
You can make a variant of approach 1 that proceeds along a diagonal to use only O(max{rows,columns}) additional space.
in term of time the solution 1 is the best because there is no recursie function.
the call of recursive function takes time
Improvement to First Approach
Do you really need the temporary array to be N by M?
No.
If the initial 2-d array has N columns, and M rows, we can solve this with a 1-d array of length M.
Method
In your first approach you save all of the subtotals as you go, but you really only need to know the apple-value of the cell to the left and above when you move to the next column. Once you have determined that, you don't look at those previous cells ever again.
The solution then is to write-over the old values when you start on the next column over.
The code will look like the following (I'm not actually a C programmer, so bear with me):
The Code
int getMax()
{
//apple[][] is the original apple array
//N is # of columns of apple[][]
//M is # of rows of apple[][]
//temp[] is initialized to zeroes, and has length M
for (int currentCol = 0; currentCol < N; currentCol++)
{
temp[0] += apple[currentCol][0]; //Nothing above top row
for (int i = 1; i < M; i++)
{
int applesToLeft = temp[i];
int applesAbove = temp[i-1];
if (applesToLeft > applesAbove)
{
temp[i] = applesToLeft + apple[currentCol][i];
}
else
{
temp[i] = applesAbove + apple[currentCol][i];
}
}
}
return temp[M - 1];
}
Note: there isn't any reason to actually store the values of applesToLeft and applesAbove into local variables, and feel free to use the ? : syntax for the assignment.
Also, if there are less columns than rows, you should rotate this so the 1-d array is the shorter length.
Doing it this way is a direct improvement over your first approach, as it saves memory, and plus iterating over the same 1-d array really helps with caching.
I can only think of one reason to use a different approach:
Multi-Threading
To gain the benefits of multi-threading for this problem, your 2nd approach is just about right.
In your second approach you use a memo to store the intermediate results.
If you make your memo thread-safe (by locking or using a lock-free hash-set) , then you can start multiple threads all trying to get the answer for the bottom-right corner.
[// Edit: actually since assigning ints into an array is an atomic operation, I don't think you would need to lock at all ].
Make each call to getMax choose randomly whether to do the left getMax or above getMax first.
This means that each thread works on a different part of the problem and since there is the memo, it won't repeat work a different thread has already done.

How do you removing a cycle of integers (e.g. 1-2-3-1) from an array

If you have an array of integers, such as 1 2 5 4 3 2 1 5 9
What is the best way in C, to remove cycles of integers from an array.
i.e. above, 1-2-5-4-3-2-1 is a cycle and should be removed to be left with just 1 5 9.
How can I do this?
Thanks!!
A straight forward search in an array could look like this:
int arr[] = {1, 2, 5, 4, 3, 2, 1, 5, 9};
int len = 9;
int i, j;
for (i = 0; i < len; i++) {
for (j = 0; j < i; j++) {
if (arr[i] == arr[j]) {
// remove elements between i and j
memmove(&arr[j], &arr[i], (len-i)*sizeof(int));
len -= i-j;
i = j;
break;
}
}
}
Build a graph and select edges based on running depth first search on it.
Mark vertices when you visit them, add edges as you traverse graph, don't add edges that have already been selected - they would connect previously visited components and therefore create a cycle.
From the array in your example we can't tell what is considered a cycle.
In your example both 2 -> 5 and 1 -> 5 as well as 1 -> 2 so in graph (?):
1 -> 2
| |
| V
+--> 5
So where is the information of which elements are connected?
There is a simple way, with O(n^2) complexity: simply iterate over each array entry from the beginning, and search the array for the last identical value. If that is in the same position as your current position, move on. Otherwise, delete the sequence (except for the initial value) and move on. You should be able to implement this using two nested for loops plus a conditional memcpy.
There is a more complex way, with O(n log n) complexity. If your data set is large, this one will be preferable for performance, though it is more complex to implement and therefore more error-prone.
1) Sort the array - this is the O(n log n) part if you use a good sorting algorithm. Do so by reference - you want to keep the original. This moves all identical values together. Break sort-order ties by position in the original array, this will help in the next step.
2) Iterate once over the sorted array (O(n)), looking for runs of the same value. Because these runs are themselves sorted by position, you can trivially find each cycle involving that value by comparing adjacent pairs for equality. Erase (not delete) each cycle from the original array by replacing each value except the last with a sentinel (zero might work). Don't close the gaps yet, or the references will break.
NB: At this stage you need to ignore any endpoints that have already been erased from the array. Because they will resolve to sentinels, you simply have to be careful to not erase "runs" that involve the sentinel value at either end.
3) Throw away the sorted array, and use the sentinels to close the gaps in the original array. This should be O(n).
Actually implementing this in any given language is left as an exercise for the reader. :-)

Resources