Slow radix sort in C - c

I have to sort numbers in array in ascending order and my time complexity has to be O(n). I'm using radix sort and it's not fast enough. Any ideas how could i make my code faster? Here it is:
void radix(int *a, int n) {
int i;
int sorted[n];
int number = 1;
int biggestNumber = -1;
for(i = 0; i < n; i++){
if(a[i] > biggestNumber)
biggestNumber = a[i]; }
while (biggestNumber / number > 0){
int bucket[10] = { 0 };
for (i = 0; i < n; i++)
bucket[(a[i] / number) % 10]++;
for (i = 1; i < 10; i++)
bucket[i] += bucket[i - 1];
for (i = n - 1; i >= 0; i--)
sorted[--bucket[(a[i] / number) % 10]] = a[i];
for (i = 0; i < n; i++)
a[i] = sorted[i];
number*= 10; } }

Comment - The sort appears to only work with positive numbers, if a[i] is negative, then a negative index is used for bucket[...] and sorted[...]. You could change this to sort unsigned integers if signed integers are not required. There's no check for overflow on number *= 10. sorted is being allocated from the stack, which won't work if n is large. Use malloc() to allocate space for sorted.
To make the sort faster:
Change the base of the radix from 10 to 256. To avoid possible overflow, check for 0 == (number *= 256) to break out of the loop.
Alternate the direction of the radix sort on each pass. 1st pass from a to sorted, next pass from sorted to a. This is easiest using a pair of pointers, that are swapped on each pass, then after sort is complete, checking to see if the sorted data ended up in a[] and if not, copy from sorted[] to a[].
Make bucket a matrix. Assuming ints are 32 bits, and the base is 256, then bucket would be [4][256]. This allows a single pass over a[] to create the bucket matrix. If ints are 64 bits, bucket would be [8][256].

Related

Is it possible to do radix sort starting from the first digit rather than the last digit?

I know radix sort is usually done by comparing the right-most digit in a number. I am wondering if it can be done by starting from the left most digit in a number. For example, the number 2567. We would first look at the 1's place and place it into a "bucket", but can you start by looking at the 1000's place first, aka the number 2.
The answer is yes, but it would be less convenient and efficient than sorting from less significant to most significant.
To sort from the most significant to the less significant, you would first hash sort with the most significant digit of same weight. It would be thousands for your example value. The result is 10 bins with values. Then for each bin, if it contains more than one value, you hash sort the values in another 10 table using the second digit. Etc.
This would require to maintain as many hash tables as there are digits while sorting. You would also need to examine more bins than with radix sort. With radix sort, we need only one hash table and visit at most n*10 bins where n is the number of digits.
It can be done, but there is more overhead. Assuming a read pass is done before each radix sort pass, variable size bins are created to avoid wasting space. The first most significant digit will result in 10 bins. The second most significant digit will result in 100 bins. Third digit, 1000 bins. Forth digit, 10000 bins.
For arrays of specific size, doing one or two leading "digits" could result in bins that fit within the processors cache, which helps on the radix sort passes since those are random access writes. The remainder of the radix sort on each of the bins would be least significant digit first.
Example code to sort 32 bit unsigned integers, where 256 bins are created and sorted by the MSB, then the 256 bins are sorted by the 3 LSB. Again, the premise here is that the array is much larger than the cache, but each of the 256 bins will fit in cache. a is the array to be sorted, b is an allocated working array the same size as a.
// split array into 256 bins according to most significant byte
void RadixSort(uint32_t * a, uint32_t*b, size_t count)
{
size_t aIndex[260] = {0}; // count / array
size_t i;
for(i = 0; i < count; i++) // generate histogram
aIndex[1+((size_t)(a[i] >> 24))]++;
for(i = 2; i < 257; i++) // convert to indices
aIndex[i] += aIndex[i-1];
for(i = 0; i < count; i++) // sort by msb
b[aIndex[a[i]>>24]++] = a[i];
for(i = 256; i; i--) // restore aIndex
aIndex[i] = aIndex[i-1];
aIndex[0] = 0;
for(i = 0; i < 256; i++) // radix sort the 256 bins
RadixSort3(&b[aIndex[i]], &a[aIndex[i]], aIndex[i+1]-aIndex[i]);
}
// sort a bin by 3 least significant bytes
void RadixSort3(uint32_t * a, uint32_t *b, size_t count)
{
size_t mIndex[3][256] = {0}; // count / matrix
size_t i,j,m,n;
uint32_t u;
if(count == 0)
return;
for(i = 0; i < count; i++){ // generate histograms
u = a[i];
for(j = 0; j < 3; j++){
mIndex[j][(size_t)(u & 0xff)]++;
u >>= 8;
}
}
for(j = 0; j < 3; j++){ // convert to indices
m = 0;
for(i = 0; i < 256; i++){
n = mIndex[j][i];
mIndex[j][i] = m;
m += n;
}
}
for(j = 0; j < 3; j++){ // radix sort
for(i = 0; i < count; i++){ // sort by current lsb
u = a[i];
m = (size_t)(u>>(j<<3))&0xff;
b[mIndex[j][m]++] = u;
}
std::swap(a, b); // swap ptrs
}
}
Yes, there is virtually no difference unless you know something about the distribution of the numbers that are to be sorted. You can also just invert the sequence of the numbers, sort by lest significant and invert the sequence of numbers again, adding a just small constant offset, the complexity stays the same.

Selection Sort Counting comparisons

i have a program with Selection Sort, which generates and sorts random numbers in ascending order and descending order. The problem is with the counting of comparisons. It gives correct number until 10 0000 numbers, but when i generate 100k numbers, it returns wrong value than the one from a formula.
Here is my Selection Sort code.
void select (int n, float *pole2,int *compare,int *move,char decide)
{
*compare=0; // number of comparisons
*move=0;
int i;
for (i = 0; i < n - 1; i++)
{
int j, poz_min;
float temp,min;
min = pole2[i];
poz_min = i;//
for (j = i+1; j < n; j++)
{
*compare+=1;
if (pole2[j] < min)
{
min = pole2[j];
*move+=1;
poz_min=j;
}
}
temp = pole2[i];
pole2[i] = pole2[poz_min];
pole2[poz_min] = temp;
*move+=3;
}
// Writing to a binary file
FILE *fw;
fw = fopen("Select_SORT.DAT", "wb+");
int z;
for(z = 0; z < n; z++)
{
fwrite(&pole2[z], sizeof(pole2[z]), 1, fw);
}
fclose(fw);
fseek(fw, 0, SEEK_SET);
}
Well that's because for 100K there is actually 10^10 comparisons. An int on your system cant hold it. Try using long long to be safe. Also compare what you get with INT_MAX. You will get the idea.
For n elements there are O(n^2)(n*(n-1)/2 to be precise) comparisons in case of selection sort.
At first glance int *compare is capable of containing maximum value of 65536.
Try long.
https://en.wikipedia.org/wiki/C_data_types

generating graph with random edges

I am trying to write a c/c++ program that uses Disjoint Sets using union by rank and path compression Graph Algorithm then apply Kruskal's algorithm on that graph.I already generated number_of_vertices-1 pairs (0,1),(1,2)...(n-2,n-1) as edges in the graph in order to make the graph connected. I need to generate the rest of 3*number_Of_Vertices+1 random edges as pairs of (vertex1,vertex2) without collisions(the same edge shall not be generated twice). I have to do this without using extra memory. By extra memory i mean an extra list, vector...Do you guyz have any idea how to do this?
This is what i did until now but it surely has collisions:
edge** createRandomEdges(nodeG **nodeArray, int n) {
edge **edgeArray = (edge**)malloc(sizeof(edge*)*n * 4);
for (int i = 0; i < n; i++)
edgeArray[i] = createEdge(nodeArray[0], nodeArray[i + 1], rand() % 100+1);
for (int i = n; i < 4 * n; i++) {
int nodeAindex = rand() % n;
int nodeBindex = rand() % n;
while (nodeAindex == nodeBindex) {
nodeAindex = rand() % n;
nodeBindex = rand() % n;
}
int weight = rand() % 100 + 1;
edgeArray[i] = createEdge(nodeArray[nodeAindex], nodeArray[nodeBindex], weight);
}
return edgeArray;
}
So you have N edges and want to mark K of them optimizing memory consumption. In this case you can use Reservoir sampling with O(K) memory complexity.
Make an array of integers with size K, fill it with 0..K-1 numbers, then walk a loop and randomly replace some numbers using rules that provide uniformity
ReservoirSample(S[1..n], R[1..k])
// fill the reservoir array
for i = 1 to k
R[i] := S[i]
// replace elements with gradually decreasing probability
for i = k+1 to n
j := random(1, i) // important: inclusive range
if j <= k
R[j] := S[i]

How to find the number of elements in the array that are bigger than all elements after it?

I have a function that takes a one-dimensional array of N positive integers and returns the number of elements that are larger than all the next. The problem is exist a function to do it that in a better time? My code is the following:
int count(int *p, int n) {
int i, j;
int countNo = 0;
int flag = 0;
for(i = 0; i < n; i++) {
flag = 1;
for(j = i + 1; j < n; j++) {
if(p[i] <= p[j]) {
flag = 0;
break;
}
}
if(flag) {
countNo++;
}
}
return countNo;
}
My solution is O(n^2). Can it be done better?
You can solve this problem in linear time(O(n) time). Note that the last number in the array will always be a valid number that fits the problem definition. So the function will always output a value that will be greater than equal to 1.
For any other number in the array to be a valid number it must be greater than or equal to the greatest number that is after that number in the array.
So iterate over the array from right to left keeping track of the greatest number found till now and increment the counter if current number is greater than or equal to the greatest found till now.
Working code
int count2(int *p, int n) {
int max = -1000; //this variable represents negative infinity.
int cnt = 0;
int i;
for(i = n-1; i >=0; i--) {
if(p[i] >= max){
cnt++;
}
if(p[i] > max){
max = p[i];
}
}
return cnt;
}
Time complexity : O(n)
Space complexity : O(1)
It can be done in O(n).
int count(int *p, int n) {
int i, currentMax;
int countNo = 0;
currentMax = p[n-1];
for(i = n-1; i >= 0; i--) {
if(currentMax < p[i])
{
countNo ++;
currentMax = p[i];
}
}
return countNo;
}
Create an auxillary array aux:
aux[i] = max{arr[i+1], ... ,arr[n-1] }
It can be done in linear time by scanning the array from right to left.
Now, you only need the number of elements such that arr[i] > aux[i]
This is done in O(n).
Walk backwards trough the array, and keep track of the current maximum. Whenever you find a new maximum, that element is larger than the elements following.
Yes, it can be done in O(N) time. I'll give you an approach on how to go about it. If I understand your question correctly, you want the number of elements that are larger than all the elements that come next in the array provided the order is maintained.
So:
Let len = length of array x
{...,x[i],x[i+1]...x[len-1]}
We want the count of all elements x[i] such that x[i]> x[i+1]
and so on till x[len-1]
Start traversing the array from the end i.e. at i = len -1 and keep track of the largest element that you've encountered.
It could be something like this:
max = x[len-1] //A sentinel max
//Start a loop from i = len-1 to i = 0;
if(x[i] > max)
max = x[i] //Update max as you encounter elements
//Now consider a situation when we are in the middle of the array at some i = j
{...,x[j],....x[len-1]}
//Right now we have a value of max which is the largest of elements from i=j+1 to len-1
So when you encounter an x[j] that is larger than max, you've essentially found an element that's larger than all the elements next. You could just have a counter and increment it when that happens.
Pseudocode to show the flow of algorithm:
counter = 0
i = length of array x - 1
max = x[i]
i = i-1
while(i>=0){
if(x[i] > max){
max = x[i] //update max
counter++ //update counter
}
i--
}
So ultimately counter will have the number of elements you require.
Hope I was able to explain you how to go about this. Coding this should be a fun exercise as a starting point.

How do the functions work?

Could you explain me how the following two algorithms work?
int countSort(int arr[], int n, int exp)
{
int output[n];
int i, count[n] ;
for (int i=0; i < n; i++)
count[i] = 0;
for (i = 0; i < n; i++)
count[ (arr[i]/exp)%n ]++;
for (i = 1; i < n; i++)
count[i] += count[i - 1];
for (i = n - 1; i >= 0; i--)
{
output[count[ (arr[i]/exp)%n] - 1] = arr[i];
count[(arr[i]/exp)%n]--;
}
for (i = 0; i < n; i++)
arr[i] = output[i];
}
void sort(int arr[], int n)
{
countSort(arr, n, 1);
countSort(arr, n, n);
}
I wanted to apply the algorithm at this array:
After calling the function countSort(arr, n, 1) , we get this:
When I call then the function countSort(arr, n, n) , at this for loop:
for (i = n - 1; i >= 0; i--)
{
output[count[ (arr[i]/exp)%n] - 1] = arr[i];
count[(arr[i]/exp)%n]--;
}
I get output[-1]=arr[4].
But the array doesn't have such a position...
Have I done something wrong?
EDIT:Considering the array arr[] = { 10, 6, 8, 2, 3 }, the array count will contain the following elements:
what do these numbers represent? How do we use them?
Counting sort is very easy - let's say you have an array which contains numbers from range 1..3:
[3,1,2,3,1,1,3,1,2]
You can count how many times each number occurs in the array:
count[1] = 4
count[2] = 2
count[3] = 3
Now you know that in a sorted array,
number 1 will occupy positions 0..3 (from 0 to count[1] - 1), followed by
number 2 on positions 4..5 (from count[1] to count[1] + count[2] - 1), followed by
number 3 on positions 6..8 (from count[1] + count[2] to count[1] + count[2] + count[3] - 1).
Now that you know final position of every number, you can just insert every number at its correct position. That's basically what countSort function does.
However, in real life your input array would not contain just numbers from range 1..3, so the solution is to sort numbers on the least significant digit (LSD) first, then LSD-1 ... up to the most significant digit.
This way you can sort bigger numbers by sorting numbers from range 0..9 (single digit range in decimal numeral system).
This code: (arr[i]/exp)%n in countSort is used just to get those digits. n is base of your numeral system, so for decimal you should use n = 10 and exp should start with 1 and be multiplied by base in every iteration to get consecutive digits.
For example, if we want to get third digit from right side, we use n = 10 and exp = 10^2:
x = 1234,
(x/exp)%n = 2.
This algorithm is called Radix sort and is explained in detail on Wikipedia: http://en.wikipedia.org/wiki/Radix_sort
It took a bit of time to pick though your countSort routine and attempt to determine just what it was you were doing compared to a normal radix sort. There are some versions that split the iteration and the actual sort routine which appears to be what you attempted using both countSort and sort functions. However, after going though that exercise, it was clear you had just missed including necessary parts of the sort routine. After fixing various compile/declaration issues in your original code, the following adds the pieces you overlooked.
In your countSort function, the size of your count array was wrong. It must be the size of the base, in this case 10. (you had 5) You confused the use of exp and base throughout the function. The exp variable steps through the powers of 10 allowing you to get the value and position of each element in the array when combined with a modulo base operation. You had modulo n instead. This problem also permeated you loop ranges, where you had a number of your loop indexes iterating over 0 < n where the correct range was 0 < base.
You missed finding the maximum value in the original array which is then used to limit the number of passes through the array to perform the sort. In fact all of your existing loops in countSort must fall within the outer-loop iterating while (m / exp > 0). Lastly, you omitted a increment of exp within the outer-loop necessary to applying the sort to each element within the array. I guess you just got confused, but I commend your effort in attempting to rewrite the sort routine and not just copy/pasting from somewhere else. (you may have copied/pasted, but if that's the case, you have additional problems...)
With each of those issues addressed, the sort works. Look though the changes and understand what it is doing. The radix sort/count sort are distribution sorts relying on where numbers occur and manipulating indexes rather than comparing values against one another which makes this type of sort awkward to understand at first. Let me know if you have any questions. I made attempts to preserve your naming convention throughout the function, with the addition of a couple that were omitted and to prevent hardcoding 10 as the base.
#include <stdio.h>
void prnarray (int *a, int sz);
void countSort (int arr[], int n, int base)
{
int exp = 1;
int m = arr[0];
int output[n];
int count[base];
int i;
for (i = 1; i < n; i++) /* find the maximum value */
m = (arr[i] > m) ? arr[i] : m;
while (m / exp > 0)
{
for (i = 0; i < base; i++)
count[i] = 0; /* zero bucket array (count) */
for (i = 0; i < n; i++)
count[ (arr[i]/exp) % base ]++; /* count keys to go in each bucket */
for (i = 1; i < base; i++) /* indexes after end of each bucket */
count[i] += count[i - 1];
for (i = n - 1; i >= 0; i--) /* map bucket indexes to keys */
{
output[count[ (arr[i]/exp) % base] - 1] = arr[i];
count[(arr[i]/exp)%n]--;
}
for (i = 0; i < n; i++) /* fill array with sorted output */
arr[i] = output[i];
exp *= base; /* inc exp for next group of keys */
}
}
int main (void) {
int arr[] = { 10, 6, 8, 2, 3 };
int n = 5;
int base = 10;
printf ("\n The original array is:\n\n");
prnarray (arr, n);
countSort (arr, n, base);
printf ("\n The sorted array is\n\n");
prnarray (arr, n);
printf ("\n");
return 0;
}
void prnarray (int *a, int sz)
{
register int i;
printf (" [");
for (i = 0; i < sz; i++)
printf (" %d", a[i]);
printf (" ]\n");
}
output:
$ ./bin/sort_count
The original array is:
[ 10 6 8 2 3 ]
The sorted array is
[ 2 3 6 8 10 ]

Resources