Implementing Radix sort in java - quite a few questions

Implementing Radix sort in java - quite a few questions - arrays

Although it is not clearly stated in my excercise, I am supposed to implement Radix sort recursively. I've been working on the task for days, but yet, I only managed to produce garbage, unfortunately. We are required to work with two methods. The sort method receives a certain array with numbers ranging from 0 to 999 and the digit we are looking at. We are supposed to generate a two-dimensional matrix here in order to distribute the numbers inside the array. So, for example, 523 is positioned at the fifth row and 27 is positioned at the 0th row since it is interpreted as 027.
I tried to do this with the help of a switch-case-construct, dividing the numbers inside the array by 100, checking for the remainder and then position the number with respect to the remainder. Then, I somehow tried to build buckets that include only the numbers with the same digit, so for example, 237 and 247 would be thrown in the same bucket in the first "round". I tried to do this by taking the whole row of the "fields"-matrix where we put in the values before.
In the putInBucket-method, I am required to extent the bucket (which I managed to do right, I guess) and then returning it.
I am sorry, I know that the code is total garbage, but maybe there's someone out there who understands what I am up to and can help me a little bit.
I simply don't see how I need to work with the buckets here, I even don't understand why I have to extent them, and I don't see any way to returning it back to the sort-method (which, I think, I am required to do).
Further description:
The whole thing is meant to work as follows: We take an array with integers ranging from 0 to 999. Every number is then sorted by its first digit, as mentioned above. Imagine you have buckets denoted with the numbers ranging from 0 to 9. You start the sorting by putting 523 in bucket 5, 672 in bucket 6 and so on. This is easy when there is only one number (or no number at all) in one of the buckets. But it gets harder (and that's where recursion might come in hand) when you want to put more than one number in one bucket. The mechanism now goes as follows: We put two numbers with the same first digit in one bucket, for example 237 and 245. Now, we want to sort these numbers again by the same algorithm, meaning we call the sort-method (somehow) again with an array that only contains these two numbers and sorting them again, but now my we do by looking at the second digit, so we would compare 3 and 4. We sort every number inside the array like this, and at the end, in order to get a sorted array, we start at the end, meaning at bucket 9, and then just put everything together. If we would be at bucket 2, the algorithm would look into the recursive step and already receive the sorted array [237, 245] and deliver it in order to complete the whole thing.
My own problems:
I don't understand why we need to extent a bucket and I can't figure it out from the description. It is simply stated that we are supposed to do so. I'd imagine that we would to it to copy another element inside it, because if we have the buckets from 0 to 9, putting in two numbers inside the same bucket would just mean that we would overwrite the first value. This might be the reason why we need to return the new, extended bucket, but I am not sure about that. Plus, I don't know how to go further from there. Even if I have an extened bucket now, it's not like I can simply stick it to the old matrix and copy another element into it again.
public static int[] sort(int[] array, int digit) {
if (array.length == 0)
return array;
int[][] fields = new int[10][array.length];
int[] bucket = new int[array.length];
int i = 0;
for (int j = 0; j < array.length; j++) {
switch (array[j] / 100) {
case 0: i = 0; break;
case 1: i = 1; break;
...
}
fields[i][j] = array[j]
bucket[i] = fields[i][j];
}
return bucket;
}
private static int[] putInBucket(int [] bucket, int number) {
int[] bucket_new = int[bucket.length+1];
for (int i = 1; i < bucket_new.length; i++) {
bucket_new[i] = bucket[i-1];
}
return bucket_new;
}
public static void main (String [] argv) {
int[] array = readInts("Please type in the numbers: ");
int digit = 0;
int[] bucket = sort(array, digit);
}

You don't use digit in sort, that's quite suspicious
The switch/case looks like a quite convoluted way to write i = array[j] / 100
I'd recommend to read the wikipedia description of radix sort.
The expression to extract a digit from a base 10 number is (number / Math.pow(10, digit)) % 10.
Note that you can count digits from left to right or right to left, make sure you get this right.
I suppose you first want to sort for digit 0, then for digit 1, then for digit 2. So there should be a recursive call at the end of sort that does this.
Your buckets array needs to be 2-dimensional. You'll need to call it this way: buckets[i] = putInBucket(buckets[i], array[j]). If you handle null in putInBuckets, you don't need to initialize it.
The reason why you need a 2d bucket array and putInBucket (instead of your fixed size field) is that you don't know how many numbers will end up in each bucket
The second phase (reading back from the buckets to the array) is missing before the recursive call
make sure to stop the recursion after 3 digits
Good luck

Related

Given an array of integers of size n+1 consisting of the elements [1,n]. All elements are unique except one which is duplicated k times

I have been attempting to solve the following problem:
You are given an array of n+1 integers where all the elements lies in [1,n]. You are also given that one of the elements is duplicated a certain number of times, whilst the others are distinct. Develop an algorithm to find both the duplicated number and the number of times it is duplicated.
Here is my solution where I let k = number of duplications:
struct LatticePoint{ // to hold duplicate and k
int a;
int b;
LatticePoint(int a_, int b_) : a(a_), b(b_) {}
}
LatticePoint findDuplicateAndK(const std::vector<int>& A){
int n = A.size() - 1;
std::vector<int> Numbers (n);
for(int i = 0; i < n + 1; ++i){
++Numbers[A[i] - 1]; // A[i] in range [1,n] so no out-of-access
}
int i = 0;
while(i < n){
if(Numbers[i] > 1) {
int duplicate = i + 1;
int k = Numbers[i] - 1;
LatticePoint result{duplicate, k};
return LatticePoint;
}
So, the basic idea is this: we go along the array and each time we see the number A[i] we increment the value of Numbers[A[i]]. Since only the duplicate appears more than once, the index of the entry of Numbers with value greater than 1 must be the duplicate number with the value of the entry the number of duplications - 1. This algorithm of O(n) in time complexity and O(n) in space.
I was wondering if someone had a solution that is better in time and/or space? (or indeed if there are any errors in my solution...)

You can reduce the scratch space to n bits instead of n ints, provided you either have or are willing to write a bitset with run-time specified size (see boost::dynamic_bitset).
You don't need to collect duplicate counts until you know which element is duplicated, and then you only need to keep that count. So all you need to track is whether you have previously seen the value (hence, n bits). Once you find the duplicated value, set count to 2 and run through the rest of the vector, incrementing count each time you hit an instance of the value. (You initialise count to 2, since by the time you get there, you will have seen exactly two of them.)
That's still O(n) space, but the constant factor is a lot smaller.

The idea of your code works.
But, thanks to the n+1 elements, we can achieve other tradeoffs of time and space.
If we have some number of buckets we're dividing numbers between, putting n+1 numbers in means that some bucket has to wind up with more than expected. This is a variant on the well-known pigeonhole principle.
So we use 2 buckets, one for the range 1..floor(n/2) and one for floor(n/2)+1..n. After one pass through the array, we know which half the answer is in. We then divide that half into halves, make another pass, and so on. This leads to a binary search which will get the answer with O(1) data, and with ceil(log_2(n)) passes, each taking time O(n). Therefore we get the answer in time O(n log(n)).
Now we don't need to use 2 buckets. If we used 3, we'd take ceil(log_3(n)) passes. So as we increased the fixed number of buckets, we take more space and save time. Are there other tradeoffs?
Well you showed how to do it in 1 pass with n buckets. How many buckets do you need to do it in 2 passes? The answer turns out to be at least sqrt(n) bucekts. And 3 passes is possible with the cube root. And so on.
So you get a whole family of tradeoffs where the more buckets you have, the more space you need, but the fewer passes. And your solution is merely at the extreme end, taking the most spaces and the least time.

Here's a cheekier algorithm, which requires only constant space but rearranges the input vector. (It only reorders; all the original elements are still present at the end.)
It's still O(n) time, although that might not be completely obvious.
The idea is to try to rearrange the array so that A[i] is i, until we find the duplicate. The duplicate will show up when we try to put an element at the right index and it turns out that that index already holds that element. With that, we've found the duplicate; we have a value we want to move to A[j] but the same value is already at A[j]. We then scan through the rest of the array, incrementing the count every time we find another instance.
#include <utility>
#include <vector>
std::pair<int, int> count_dup(std::vector<int> A) {
/* Try to put each element in its "home" position (that is,
* where the value is the same as the index). Since the
* values start at 1, A[0] isn't home to anyone, so we start
* the loop at 1.
*/
int n = A.size();
for (int i = 1; i < n; ++i) {
while (A[i] != i) {
int j = A[i];
if (A[j] == j) {
/* j is the duplicate. Now we need to count them.
* We have one at i. There's one at j, too, but we only
* need to add it if we're not going to run into it in
* the scan. And there might be one at position 0. After that,
* we just scan through the rest of the array.
*/
int count = 1;
if (A[0] == j) ++count;
if (j < i) ++count;
for (++i; i < n; ++i) {
if (A[i] == j) ++count;
}
return std::make_pair(j, count);
}
/* This swap can only happen once per element. */
std::swap(A[i], A[j]);
}
}
/* If we get here, every element from 1 to n is at home.
* So the duplicate must be A[0], and the duplicate count
* must be 2.
*/
return std::make_pair(A[0], 2);
}

A parallel solution with O(1) complexity is possible.
Introduce an array of atomic booleans and two atomic integers called duplicate and count. First set count to 1. Then access the array in parallel at the index positions of the numbers and perform a test-and-set operation on the boolean. If a boolean is set already, assign the number to duplicate and increment count.
This solution may not always perform better than the suggested sequential alternatives. Certainly not if all numbers are duplicates. Still, it has constant complexity in theory. Or maybe linear complexity in the number of duplicates. I am not quite sure. However, it should perform well when using many cores and especially if the test-and-set and increment operations are lock-free.

How to solve a runtime error happening when I use a big size of static array

my development environment : visual studio
Now, I have to create a input file and print random numbers from 1 to 500000 without duplicating in the file. First, I considered that if I use a big size of local array, problems related to heap may happen. So, I tried to declare as a static array. Then, in main function, I put random numbers without overlapping in the array and wrote the numbers in input file accessing array elements. However, runtime errors(the continuous blinking of the cursor in the console window) continue to occur.
The source code is as follows.
#define SIZE 500000
int sort[500000];
int main()
{
FILE* input = NULL;
input = fopen("input.txt", "w");
if (sort != NULL)
{
srand((unsigned)time(NULL));
for (int i = 0; i < SIZE; i++)
{
sort[i] = (rand() % SIZE) + 1;
for (int j = 0; j < i; j++)
{
if (sort[i] == sort[j])
{
i--;
break;
}
}
}
for (int i = 0; i < SIZE; i++)
{
fprintf(input, "%d ", sort[i]);
}
fclose(input);
}
return 0;
}
When I tried to reduce the array size from 1 to 5000, it has been implemented. So, Carefully, I think it's a memory out phenomenon. Finally, I'd appreciate it if you could comment on how to solve this problem.

“First, I considered that if I use a big size of local array, problems related to heap may happen.”
That does not make any sense. Automatic local objects generally come from the stack, not the heap. (Also, “heap” is the wrong word; a heap is a particular kind of data structure, but the malloc family of routines may use other data structures for managing memory. This can be referred to simply as dynamically allocated memory or allocated memory.)
However, runtime errors(the continuous blinking of the cursor in the console window)…
Continuous blinking of the cursor is normal operation, not a run-time error. Perhaps you are trying to say your program continues executing without ever stopping.
#define SIZE 500000<br>
...
sort[i] = (rand() % SIZE) + 1;
The C standard only requires rand to generate numbers from 0 to 32767. Some implementations may provide more. However, if your implementation does not generate numbers up to 499,999, then it will never generate the numbers required to fill the array using this method.
Also, using % to reduce the rand result skews the distribution. For example, if we were reducing modulo 30,000, and rand generated numbers from 0 to 44,999, then rand() % 30000 would generate the numbers from 0 to 14,999 each two times out of every 45,000 and the numbers from 15,000 to 29,999 each one time out of every 45,000.
for (int j = 0; j < i; j++)
So this algorithm attempts to find new numbers by rejecting those that duplicate previous numbers. When working on the last of n numbers, the average number of tries is n, if the selection of random numbers is uniform. When working on the second-to-last number, the average is n/2. When working on the third-to-last, the average is n/3. So the average number of tries for all the numbers is n + n/2 + n/3 + n/4 + n/5 + … 1.
For 5000 elements, this sum is around 45,472.5. For 500,000 elements, it is around 6,849,790. So your program will average around 150 times the number of tries with 500,000 elements than with 5,000. However, each try also takes longer: For the first try, you check against zero prior elements for duplicates. For the second, you check against one prior element. For try n, you check against n−1 elements. So, for the last of 500,000 elements, you check against 499,999 elements, and, on average, you have to repeat this 500,000 times. So the last try takes around 500,000•499,999 = 249,999,500,000 units of work.
Refining this estimate, for each selection i, a successful attempt that gets completely through the loop of checking requires checking against all i−1 prior numbers. An unsuccessful attempt will average going halfway through the prior numbers. So, for selection i, there is one successful check of i−1 numbers and, on average, n/(n+1−i) unsuccessful checks of an average of (i−1)/2 numbers.
For 5,000 numbers, the average number of checks will be around 107,455,347. For 500,000 numbers, the average will be around 1,649,951,055,183. Thus, your program with 500,000 numbers takes more than 15,000 times as long than with 5,000 numbers.
When I tried to reduce the array size from 1 to 5000, it has been implemented.
I think you mean that with an array size of 5,000, the program completes execution in a short amount of time?
So, Carefully, I think it's a memory out phenomenon.
No, there is no memory issue here. Modern general-purpose computer systems easily handle static arrays of 500,000 int.
Finally, I'd appreciate it if you could comment on how to solve this problem.
Use a Fischer-Yates shuffle: Fill the array A with integers from 1 to SIZE. Set a counter, say d to the number of selections completed so far, initially zero. Then pick a random number r from 1 to SIZE-d. Move the number in that position of the array to the front by swapping A[r] with A[d]. Then increment d. Repeat until d reaches SIZE-1.
This will swap a random element of the initial array into A[0], then a random element from those remaining into A[1], then a random element from those remaining into A[2], and so on. (We stop when d reaches SIZE-1 rather than when it reaches SIZE because, once d reaches SIZE-1, there is only one more selection to make, but there is also only one number left, and it is already in the last position in the array.)

Finding the amount of different elements at array

We have an array at size n. How we can find how many different types of elements we have at n and what is the amount of each one?
For example: at {1,-5,2,-5,2,7,-5,-5} we have 4 different types, and the array of the amounts will be: {1,2,1,4}.
So my questions are:
How we can find how many different elements there is at the array?
How we can count the amount if each one?
Now, I try to solve it at Omega(n), I try a lot but I didn't find a way. I try to solve it with hash-tables.

You are trying to get frequency of an element in an array.
Initialize a Hash where every new key is initialized with value 0.
Loop through array and add this key to hash and increment the value.
In JavaScript:
hash = {};
a = [1,-5,2,-5,2,7,-5,-5];
for(var i = 0; i < a.length; ++i) {
if(hash[a[i]] === undefined)
hash[a[i]] = 0
hash[a[i]] = hash[a[i]] + 1;
}
console.log(hash.toSource());

The syntax and specific data structures you use will vary between languages, but the basic idea would be to store a running count of the number of instances of each value in an associative data structure (HashMap, Dictionary, whatever your language calls it).
Here is an example that will work in Java (I took a guess at the language you were using).
It's probably bad Java, but it illustrates the idea.
int[] myArray = {1,-5,2,-5,2,7,-5,-5};
HashMap<Object,Integer> occurrences = new HashMap<Object,Integer>();
for (int i=0;i<myArray.length;i++)
{
if (occurrences.get(myArray[i]) == null)
{
occurrences.put(myArray[i],1);
}
else
{
occurrences.put(myArray[i],occurrences.get(myArray[i])+1);
}
}
You can then use your HashMap to look up the distinct elements of the array like this
occurrences.keySet()
Other languages have their own HashSet implementations (Dictionaries in .NET and Python, Hashes in Ruby).

There are different approaches to solve this problem.The question that asked here might be asked in different ways.Here the the simple way to do it with std::map which is available in STL libraries.But remember it will be always sort by key.
int arr[]={1,-5,2,-5,2,7,-5,-5};
int n=sizeof(arr)/sizeof(arr[0]);
map<int,int>v;
for(int i=0;i<n;i++)
{
if(v[arr[i]])
v[arr[i]]++;
else
v[arr[i]]=1;
}
map<int,int>::iterator it;
for(it=v.begin();it!=v.end();++it)
cout<<it->first<<" "<<it->second<<endl;
return 0;
it will show output like
-5 4
1 1
2 2
7 1

I suggest you read about 'Count Sort'
Although i am not sure i understood correctly what you actually want to ask. Anyway, i think you want to:
1.) Scan an array and come up with the frequency of each unique element in that array.
2.) Total amount of unique elements
3.) all that in linear computational time
I think, what you need is Counting Sort. See algo on wiki.
You can obviously skip the sorting part. But you must see how it does the sorting (the useful part for your problem). It, first, calculates a histogram (array of size nominally equal to the number of unique elements in you original array) of frequency of each key. This works for integers only (although you can always sort other types by putting integer pointers).
So, every index of this histogram array will correspond to an element in your original array, and the value at this index will correspond to the frequency of this element in the original array.
For Example;
your array x = {3, 4, 3, 3, 1, 0, 1, 3}
//after calculation, you will get
your histogram array h[0 to 4] = {1, 2, 0, 4, 1}
i hope that is what you asked

find the largest ten numbers in an array in C

I have an array of int (the length of the array can go from 11 to 500) and i need to extract, in another array, the largest ten numbers.
So, my starting code could be this:
arrayNumbers[n]; //array in input with numbers, 11<n<500
int arrayMax[10];
for (int i=0; i<n; i++){
if(arrayNumbers[i] ....
//here, i need the code to save current int in arrayMax correctly
}
//at the end of cycle, i want to have in arrayMax, the ten largest numbers (they haven't to be ordered)
What's the best efficient way to do this in C?

Study maxheap. Maintain a heap of size 10 and ignore all spilling elements. If you face a difficulty please ask.
EDIT:
If number of elements are less than 20, find n-10 smallest elements and rest if the numbers are top 10 numbers.
Visualize a heap here
EDIT2: Based on comment from Sleepy head, I searched and found this (I have not tested). You can find kth largest element (10 in this case) in )(n) time. Now in O(n) time, you can find first 10 elements which are greater than or equal to this kth largest number. Final complexity is linear.

Here is a algo which solves in linear time:
Use the selection algorithm, which effectively find the k-th element in a un-sorted array in linear time. You can either use a variant of quick sort or more robust algorithms.
Get the top k using the pivot got in step 1.

This is my idea:
insert first 10 elements of your arrayNum into arrMax.
Sort those 10 elements arrMax[0] = min , arrMax[9] = max.
then check the remaining elements one by one and insert every possible candidate into it's right position as follow (draft):
int k, r, p;
for (int k = 10; k < n; k++)
{
r = 0;
while(1)
{
if (arrMax[r] > arrNum[k]) break; // position to insert new comer
else if (r == 10) break; // don't exceed length of arrMax
else r++; // iteration
}
if (r != 0) // no need to insert number smaller than all members
{
for (p=0; p<r-1; p++) arrMax[p]=arrMax[p+1]; // shift arrMax to make space for new comer
arrMax[r-1] = arrNum[k]; // insert new comer at it's position
}
} // done!

Sort the array and insert Max 10 elements in another array

you can use the "select" algorithm which finds you the i-th largest number (you can put any number you like instead of i) and then iterate over the array and find the numbers that are bigger than i. in your case i=10 of course..

The following example can help you. it arranges the biggest 10 elements of the original array into arrMax assuming you have all positive numbers in the original array arrNum. Based on this you can work for negative numbers also by initializing all elements of the arrMax with possible smallest number.
Anyway, using a heap of 10 elements is a better solution rather than this one.
void main()
{
int arrNum[500]={1,2,3,21,34,4,5,6,7,87,8,9,10,11,12,13,14,15,16,17,18,19,20};
int arrMax[10]={0};
int i,cur,j,nn=23,pos;
clrscr();
for(cur=0;cur<nn;cur++)
{
for(pos=9;pos>=0;pos--)
if(arrMax[pos]<arrNum[cur])
break;
for(j=1;j<=pos;j++)
arrMax[j-1]=arrMax[j];
if(pos>=0)
arrMax[pos]=arrNum[cur];
}
for(i=0;i<10;i++)
printf("%d ",arrMax[i]);
getch();
}

When improving efficiency of an algorithm, it is often best (and instructive) to start with a naive implementation and improve it. Since in your question you obviously don't even have that, efficiency is perhaps a moot point.
If you start with the simpler question of how to find the largest integer:
Initialise largest_found to INT_MIN
Iterate the array with :
IF value > largest_found THEN largest_found = value
To get the 10 largest, you perform the same algorithm 10 times, but retaining the last_largest and its index from the previous iteration, modify the largest_found test thus:
IF value > largest_found &&
value <= last_largest_found &&
index != last_largest_index
THEN
largest_found = last_largest_found = value
last_largest_index = index
Start with that, then ask yourself (or here) about efficiency.

How do you removing a cycle of integers (e.g. 1-2-3-1) from an array

If you have an array of integers, such as 1 2 5 4 3 2 1 5 9
What is the best way in C, to remove cycles of integers from an array.
i.e. above, 1-2-5-4-3-2-1 is a cycle and should be removed to be left with just 1 5 9.
How can I do this?
Thanks!!

A straight forward search in an array could look like this:
int arr[] = {1, 2, 5, 4, 3, 2, 1, 5, 9};
int len = 9;
int i, j;
for (i = 0; i < len; i++) {
for (j = 0; j < i; j++) {
if (arr[i] == arr[j]) {
// remove elements between i and j
memmove(&arr[j], &arr[i], (len-i)*sizeof(int));
len -= i-j;
i = j;
break;
}
}
}

Build a graph and select edges based on running depth first search on it.
Mark vertices when you visit them, add edges as you traverse graph, don't add edges that have already been selected - they would connect previously visited components and therefore create a cycle.
From the array in your example we can't tell what is considered a cycle.
In your example both 2 -> 5 and 1 -> 5 as well as 1 -> 2 so in graph (?):
1 -> 2
| |
| V
+--> 5
So where is the information of which elements are connected?

There is a simple way, with O(n^2) complexity: simply iterate over each array entry from the beginning, and search the array for the last identical value. If that is in the same position as your current position, move on. Otherwise, delete the sequence (except for the initial value) and move on. You should be able to implement this using two nested for loops plus a conditional memcpy.
There is a more complex way, with O(n log n) complexity. If your data set is large, this one will be preferable for performance, though it is more complex to implement and therefore more error-prone.
1) Sort the array - this is the O(n log n) part if you use a good sorting algorithm. Do so by reference - you want to keep the original. This moves all identical values together. Break sort-order ties by position in the original array, this will help in the next step.
2) Iterate once over the sorted array (O(n)), looking for runs of the same value. Because these runs are themselves sorted by position, you can trivially find each cycle involving that value by comparing adjacent pairs for equality. Erase (not delete) each cycle from the original array by replacing each value except the last with a sentinel (zero might work). Don't close the gaps yet, or the references will break.
NB: At this stage you need to ignore any endpoints that have already been erased from the array. Because they will resolve to sentinels, you simply have to be careful to not erase "runs" that involve the sentinel value at either end.
3) Throw away the sorted array, and use the sentinels to close the gaps in the original array. This should be O(n).
Actually implementing this in any given language is left as an exercise for the reader. :-)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight