Sorting columns(comparison lexicography) in matrix

Sorting columns(comparison lexicography) in matrix | C - c

I am stuck int he second part of this mission. I think i have problem with my algorithm.
Let me know please if my code is in the good direction.
This is my mession
Given set two - dimensional integers. The array consists of 5 rows and 10 columns. Each value in the system is a random number between 0 and 20. Have to write a program that performs the sorting of the array values as follows: First there arrange the values in each column so that they are sorted in ascending order (top to bottom), then - so there can sort the columns right "comes right" by comparing pairs of values in different columns in the same row (a "comparison lexicography"): comparing two values in two columns in the first row, if they are the same compared to the values in the second row, and so on, and accordingly change the order of columns (see example in the third printing of the array, below). To display the array before sorting and after each of the two phases of the emergency. for example :
#include "stdio.h"
#include "conio.h"
#include "malloc.h"
#include "stdlib.h"
#define N 5
#define M 10
#define LOW 0
#define HIGH 20
void initRandomArray(int arr[N][M]);
void printArray(int arr[N][M]);
void SortInColumn(int arr[N][M],int m);
void SortColumns(int arr[][M]);
int compareColumns(int arr[][M], int col1, int col2);
void swapColumns( int col1, int col2);
int main()
{
int arr[N][M];
int m;
m=M;
srand((unsigned)time(NULL)); //To clear the stack of Random Number
initRandomArray(arr);
printf("Before sorting:\n");
printArray(arr);
printf("Sorting elements in each column:\n");
SortInColumn(arr,M);
printf("Sorting columns:\n");
SortColumns(arr);
system("pause");
return 0;
}
void initRandomArray(int arr[N][M])
{
int i,j;
for (i=0 ; i<N ; i++)
for (j=0 ; j<M ; j++)
{
arr[i][j]=LOW+rand()%(HIGH-LOW+1);
}
}
void printArray(int arr[N][M])
{
int i,j;
for (i=0 ; i<N ; i++)
{
for (j=0 ; j<M ; j++)
printf("%d ", arr[i][j]);
printf("\n");
}
}
void SortInColumn(int arr[][M],int m)
{
int i,j,k;
int temp;
for( k=0 ; k<m ; ++k) // loops around each columns
{
for(j=0; j<N-1; j++)// Loop for making sure we compare each column N-1 times since for each run we get one item in the right place
{
for(i=0; i < N-1 - j; i++) //loop do the adjacent comparison
{
if (arr[i][k]>arr[i+1][k]) // compare adjacent item
{
temp=arr[i][k];
arr[i][k]=arr[i+1][k];
arr[i+1][k]=temp;
}
}
}
}
printArray(arr);
}
void SortColumns(int arr[][M])
{ int row=0,cols=0,i=0,n=N;
int col1=arr[row][cols];
int col2=arr[row][cols];
compareColumns(arr,col1,col2);
}
int compareColumns(int arr[][M], int col1, int col2)
{
int row=0,cols=0,j;
for ( row=0 ; row < N ; row ++ );
{
for( cols=0 ; cols < M-1 ; cols++)
{
if(arr[row][cols]>arr[row][cols+1])
{
for (j=0 ; j < M-1 ; j++)
{
col1=arr[row][cols];
col2=arr[row][cols+1];
swapColumns(col1 , col2 );
}
}
}
}
printArray(arr);
}
void swapColumns(int col1, int col2)
{
int temp;
temp=col1;
col1=col2;
col2=temp;
}
By the way is the Complexity of compareColumns function is (n^3) ?

That algorithm is too slow, you can do better.
You can exploit the fact that every number is between 0 and 20 to sort the columns in linear time. To do so, use an auxiliary 20x10 array, where array[i][j] holds how many times the value i appears in column j of the original matrix. For your example, the first column holds the values 12, 18, 13, 17, 13, thus we would have:
array[12][0] = 1
array[13][0] = 2
array[18][0] = 1
array[17][0] = 1
Building this array takes O(n) time for a matrix with n elements (in fact, the problem size is always the same - 5*10 = 50)
Aftr building that array, you now have two possibilities:
a) Overwrite the original matrix with the sorted column values. For this, you would go into each column j in the auxiliary array, and scan the values. For the first column, for example, the first non-zero value is in array[12][0], which is 1, so you write "12" in the first column in the original array, and increment the row count so that you will write the next value in the correct position. Then, you'd see that array[13][0] is 2, so you'd write "13" twice in the original matrix. You keep doing this for every column. Now you have a matrix with sorted columns, and you can apply your method for lexicographic sorting between columns.
b) Since you want lexicographic order between columns, you can see that this is equivalent to sorting columns by their accumulated sum value. So, you could store another additional array of 10 elements, where each element is a structure holding the accumulated sum for array[0..20][j], and the position j (note that for position i, array[i][j]*i is the real value for the sum). Sorting this 10-element array is very fast, and now all you have to do is iterate through this sorted array. For each position in that array, you use the original index j (stored in the structure before) to index array[0][j], and then overwrite the original matrix using the method described in a). This has the advantage that you don't need to write any sorting procedure at all, you can use your system qsort.
This solution scales well for acceptable values. For a matrix with N elements, building the auxiliary array and the accumulated sums array takes O(N). Sorting the accumulated sums array takes time proportional to the number of columns, and overwriting the original array takes O(N) time.
Note that this algorithm is quick enough, but it can consume a lot of memory for very big values. You might want to think if you can reduce memory usage by increasing runtime.
As for the code you posted, it's very incomplete, please try to provide us with something a little more polished.

Related

Find the indices of the k smallest values in C

I'm implementing K Nearest Neighbor in C and I've gotten to the point where I've computed a distance matrix of every point in my to-be-labeled set of size m to every point in my already-labeled set of size n. The format of this matrix is
[[dist_0,0 ... dist_0,n-1]
.
.
.
[dist_m-1,0 ... dist_m-1,n-1]]
Next, I need to find the k smallest distances in each row so I can use the column indices to access the labels of those points and then compute the label for the point the row index is referring to. The latter part is trivial but computing the indices of the k smallest distances has me stumped. Python has easy ways to do something like this but the bare bones nature of C has gotten me a bit frustrated. I'd appreciate some pointers (no pun intended) on what to go about doing and any helpful functions C might have to help.

Without knowing k, and assuming that it can be variable, the simplest way to do this would be to:
Organize each element in a structure which holds the original column index.
Sort each row of the matrix in ascending order and take the first k elements of that row.
struct item {
unsigned value;
size_t index;
};
int compare_items(void *a, void *b) {
struct item *item_a = a;
struct item *item_b = b;
if (item_a->value < item_b->value)
return -1;
if (item_a->value > item_b->value)
return 1;
return 0;
}
// Your matrix:
struct item matrix[N][M];
/* Populate the matrix... make sure that each index is set,
* e.g. matrix[0][0] has index = 0.
*/
size_t i, j;
for (i = 0; i < M; i++) {
qsort(matrix[i], N, sizeof(struct item), compare_items);
/* Now the i-th row is sorted and you can take a look
* at the first k elements of the row.
*/
for (j = 0; j < k; j++) {
// Do something with matrix[i][j].index ...
}
}

How to sort an int array in linear time?

I had been given a homework to do a program to sort an array in ascending order.I did this:
#include <stdio.h>
int main()
{
int a[100],i,n,j,temp;
printf("Enter the number of elements: ");
scanf("%d",&n);
for(i=0;i<n;++i)
{
printf("%d. Enter element: ",i+1);
scanf("%d",&a[i]);
}
for(j=0;j<n;++j)
for(i=j+1;i<n;++i)
{
if(a[j]>a[i])
{
temp=a[j];
a[j]=a[i];
a[i]=temp;
}
}
printf("Ascending order: ");
for(i=0;i<n;++i)
printf("%d ",a[i]);
return 0;
}
The input will not be more than 10 numbers. Can this be done in less amount of code than i did here? I want the code to be as shortest as possible.Any help will be appreciated.Thanks!

If you know the range of the array elements, one way is to use another array to store the frequency of each of the array elements ( all elements should be int :) ) and print the sorted array. I am posting it for large number of elements (106). You can reduce it according to your need:
#include <stdio.h>
#include <malloc.h>
int main(void){
int t, num, *freq = malloc(sizeof(int)*1000001);
memset(freq, 0, sizeof(int)*1000001); // Set all elements of freq to 0
scanf("%d",&t); // Ask for the number of elements to be scanned (upper limit is 1000000)
for(int i = 0; i < t; i++){
scanf("%d", &num);
freq[num]++;
}
for(int i = 0; i < 1000001; i++){
if(freq[i]){
while(freq[i]--){
printf("%d\n", i);
}
}
}
}
This algorithm can be modified further. The modified version is known as Counting sort and it sorts the array in Θ(n) time.
Counting sort:1
Counting sort assumes that each of the n input elements is an integer in the range
0 to k, for some integer k. When k = O(n), the sort runs in Θ(n) time.
Counting sort determines, for each input element x, the number of elements less
than x. It uses this information to place element x directly into its position in the
output array. For example, if 17 elements are less than x, then x belongs in output
position 18. We must modify this scheme slightly to handle the situation in which
several elements have the same value, since we do not want to put them all in the
same position.
In the code for counting sort, we assume that the input is an array A[1...n] and
thus A.length = n. We require two other arrays: the array B[1....n] holds the
sorted output, and the array C[0....k] provides temporary working storage.
The pseudo code for this algo:
for i ← 1 to k do
c[i] ← 0
for j ← 1 to n do
c[A[j]] ← c[A[j]] + 1
//c[i] now contains the number of elements equal to i
for i ← 2 to k do
c[i] ← c[i] + c[i-1]
// c[i] now contains the number of elements ≤ i
for j ← n downto 1 do
B[c[A[i]]] ← A[j]
c[A[i]] ← c[A[j]] - 1
1. Content has been taken from Introduction to Algorithms by
Thomas H. Cormen and others.

You have 10 lines doing the sorting. If you're allowed to use someone else's work (subsequent notes indicate that you can't do this), you can reduce that by writing a comparator function and calling the standard C library qsort() function:
static int compare_int(void const *v1, void const *v2)
{
int i1 = *(int *)v1;
int i2 = *(int *)v2;
if (i1 < i2)
return -1;
else if (i1 > i2)
return +1;
else
return 0;
}
And then the call is:
qsort(a, n, sizeof(a[0]), compare_int);
Now, I wrote the function the way I did for a reason. In particular, it avoids arithmetic overflow which writing this does not:
static int compare_int(void const *v1, void const *v2)
{
return *(int *)v1 - *(int *)v2;
}
Also, the original pattern generalizes to comparing structures, etc. You compare the first field for inequality returning the appropriate result; if the first fields are unequal, then you compare the second fields; then the third, then the Nth, only returning 0 if every comparison shows the values are equal.
Obviously, if you're supposed to write the sort algorithm, then you'll have to do a little more work than calling qsort(). Your algorithm is a Bubble Sort. It is one of the most inefficient sorting techniques — it is O(N2). You can look up Insertion Sort (also O(N2)) but more efficient than Bubble Sort), or Selection Sort (also quadratic), or Shell Sort (very roughly O(N3/2)), or Heap Sort (O(NlgN)), or Quick Sort (O(NlgN) on average, but O(N2) in the worst case), or Intro Sort. The only ones that might be shorter than what you wrote are Insertion and Selection sorts; the others will be longer but faster for large amounts of data. For small sets like 10 or 100 numbers, efficiency is immaterial — all sorts will do. But as you get towards 1,000 or 1,000,000 entries, then the sorting algorithms really matter. You can find a lot of questions on Stack Overflow about different sorting algorithms. You can easily find information in Wikipedia for any and all of the algorithms mentioned.
Incidentally, if the input won't be more than 10 numbers, you don't need an array of size 100.

Effective Algorithms for selecting the top k ( in percent) items from a datastream:

I have to repeatedly sort an array containing 300 random elements. But i have to do a special kind of sort: I need the 5% smallest values from an subset of the array, then some value is calculated and the subset is increased. Now the value is calculated again and the subset also increased. And so on until the subset contains the whole array.
The subset starts with the first 10 elements and is increased by 10 elements after each step.
i.e. :
subset-size k=ceil(5%*subset)
10 1 (so just the smallest element)
20 1 (so also just the smallest)
30 2 (smallest and second smallest)
...
The calculated value is basically a sum of all elements smaller than k and the specially weighted k smallest element.
In code:
k = ceil(0.05 * subset) -1; // -1 because array index starts with 0...
temp = 0.0;
for( int i=0 i<k; i++)
temp += smallestElements[i];
temp += b * smallestElements[i];
I have implemented myself a selection sort based algorithm (code at the end of this post). I use MAX(k) pointers to keep track of the k smallest elements. Therefore I unnecessarily sort all elements smaller than k :/
Furthermore I know selection sort is bad for performance, which is unfortunately crucial in my case.
I tried figuring out a way how I could use some quick- or heapsort based algorithm. I know that quickselect or heapselect are perfect for finding the k smallest elements if k and the subset is fixed.
But because my subset is more like an input stream of data I think that quicksort based algorithm drop out.
I know that heapselect would be perfect for a data stream if k is fixed. But I don't manage it to adjust heapselect for dynamic k's without big performance drops, so that it is less effective than my selection-sort based version :( Can anyone help me to modify heap-select for dynamic k's?
If there is no better algorithm, you maybe find a different/faster approach for my selection sort implementation. Here is a minimal example of my implementation, the calculated variable isn't used in this example, so don't worry about it. (In my real programm i have just some loops unrolled manually for better performance)
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define ARRAY_SIZE 300
#define STEP_SIZE 10
float sortStream( float* array, float** pointerToSmallest, int k_max){
int i,j,k,last = k_max-1;
float temp=0.0;
// init first two pointers
if( array[0] < array[1] ){
pointerToSmallest[0] = &array[0];
pointerToSmallest[1] = &array[1];
}else{
pointerToSmallest[0] = &array[1];
pointerToSmallest[1] = &array[0];
}
// Init remaining pointers until i= k_max
for(i=2; i< k_max;++i){
if( *pointerToSmallest[i-1] < array[i] ){
pointerToSmallest[i] = &array[i];
}else{
pointerToSmallest[i] = pointerToSmallest[i-1];
for(j=0; j<i-1 && *pointerToSmallest[i-2-j] > array[i];++j)
pointerToSmallest[i-1-j] = pointerToSmallest[i-2-j];
pointerToSmallest[i-1-j]=&array[i];
}
if((i+1)%STEP_SIZE==0){
k = ceil(0.05 * i)-1;
for(j=0; j<k; j++)
temp += *pointerToSmallest[j];
temp += 2 * (*pointerToSmallest[k]);
}
}
// Selection sort remaining elements
for( ; i< ARRAY_SIZE; ++i){
if( *pointerToSmallest[ last ] > array[i] ) {
for(j=0; j != last && *pointerToSmallest[ last-1-j] > array[i];++j)
pointerToSmallest[last-j] = pointerToSmallest[last-1-j];
pointerToSmallest[last-j] = &array[i];
}
if( (i+1)%STEP_SIZE==0){
k = ceil(0.05 * i)-1;
for(j=0; j<k; j++)
temp += *pointerToSmallest[j];
temp += 2 * (*pointerToSmallest[k]);
}
}
return temp;
}
int main(void){
int i,k_max = ceil( 0.05 * ARRAY_SIZE );
float* array = (float*)malloc ( ARRAY_SIZE * sizeof(float));
float** pointerToSmallest = (float**)malloc( k_max * sizeof(float*));
for( i=0; i<ARRAY_SIZE; i++)
array[i]= rand() / (float)RAND_MAX*100-50;
// just return a, so that the compiler doens't drop the function call
float a = sortStream(array,pointerToSmallest, k_max);
return (int)a;
}
Thank you very much

By using two heap for storing all items from stream, you can:
find top p% elements in O(1)
update data structure (two heaps) in O(log N)
assume, now we have N elements, k = p% *N,
min heap (LargerPartHeap) for storing top k items
max heap (SmallerPartHeap) for storing the other (N - k) items.
all items in SmallerPartHeap is less or equal to min items of LargerPartHeap (top item # LargerPartHeap).
for query "what is top p% elements?", simply return LargerPartHeap
for update "new element x from stream",
2.a check new k' = (N + 1) * p%, if k' = k + 1, move top of SmallerPartHeap to LargerPartHeap. - O(logN)
2.b if x is larger than top element (min element) of LargerPartHeap, insert x to LargerPartHeap, and move top of LargerPartHeap to SmallerPartHeap; otherwise, insert x to SmallerPartHeap - O(logN)

I believe heap sort is far too complicated for this particular problem, even though that or other priority queue algorithms are well suited to get N minimum or maximum items from a stream.
The first notice is the constraint 0.05 * 300 = 15. That is the maximum amount of data, that has to be sorted at any moment. Also during each iteration one has add 10 elements. The overall operation in-place could be:
for (i = 0; i < 30; i++)
{
if (i != 1)
qsort(input + i*10, 10, sizeof(input[0]), cmpfunc);
else
qsort(input, 20, sizeof(input[0]), cmpfunc);
if (i > 1)
merge_sort15(input, 15, input + i*10, 10, cmpfunc);
}
When i==1, one could also merge sort input and input+10 to produce completely sorted array of 20 inplace, since that has lower complexity than the generic sort. Here the "optimizing" is also on minimizing the primitives of the algorithm.
Merge_sort15 would only consider the first 15 elements of the first array and the first 10 elements of the next one.
EDIT The parameters of the problem will have a considerable effect in choosing the right algorithm; here selecting 'sort 10 items' as basic unit will allow one half of the problem to be parallelized, namely sorting 30 individual blocks of 10 items each -- a problem which can be efficiently solved with fixed pipeline algorithm using sorting networks. With different parametrization such an approach may not be feasible.

C: Looping the big array efficiently

I have a master list integer array which has around 500 numbers. And, i have a set of 100 randomized number which has picked from the master list to find the missing numbers. Now, I need to go through this randomized number list against the master list. What would be the best approach in C programming to go through it without hanging the program. If i go through in simple 'for' loop for 500 elements, it will hang as it needs to go through the entire list. Could someone direct me on this?
Thanks.

First, you should profile it. It's only 500*100=50,000 operations at the max we're talking about. An average modern computer is capable of finishing it off in under one-tenth of a second, unless you code it very inefficiently.
Assuming that you would like to optimize it anyway, you should sort the master array, and run a binary search on it for each element of the randomized array. This would reduce the number of operations from 50,000 to at most 900, because a binary search of 500 numbers requires at most 9 comparisons.
Here is an implementation that uses built-in sorting and binary search functions (qsort and bsearch) of the standard C library:
int less_int(const void* left, const void* right) {
return *((const int*)left) - *((const int*)right);
}
int main(void) {
size_t num_elements = 500;
int* a = malloc(num_elements*sizeof(int));
for(size_t i=0 ; i<num_elements ; i++) {
a[i] = rand() % num_elements;
}
qsort(a, num_elements, sizeof(int), less_int);
size_t num_rand = 100;
int* r = malloc(num_rand*sizeof(int));
for(size_t i=0 ; i < num_rand ; i++) {
r[i] = rand() % num_rand;
}
for (size_t i = 0 ; i != num_rand ; i++) {
int *p = (int*) bsearch (&r[i], a, num_elements, sizeof(int), less_int);
if (p) {
printf ("%d is in the array.\n", *p);
} else {
printf ("%d is not in the array.\n", r[i]);
}
}
free(a);
free(r);
return 0;
}
Here is a link to this running program on ideone.

n - Randomised array length.
m - Masterlist array length.
Sort the randomised arrary. n*log(n)
Binary search in sorted array for every element in Master list. Hence you'll have every missing element. (m)*log(n)
=> (m+n) * log(n) for the whole operation. With n=100 and m=500 we've
600 * log(100) log to base 2
approx 3986 iterations compared to 50000 iteration with raw coding.
PS: If both arrays are sorted, just comparisons of O(m) should suffice.

Storing numbers in array in C

I have three one-dimensional arrays. The task is to store the numbers which exist in each of the three arrays in a forth array. Here is my solution which as you see isn't correct. I'm also interested in a faster algorithm if possible because it's O(N3) difficulty.
#include <stdio.h>
main(){
int a[5]={1,3,6,7,8};
int b[5]={2,5,8,7,3};
int c[5]={4,7,1,3,6};
int i,j,k;
int n=0;
int d[5];
for(k=0; k<5; k++){
for(j=0; j<5; j++){
for(i=0; i<5; i++){
if(a[i]==b[j] && b[j]==c[k])
{d[n]=a[i];
n++;}
else
d[n]=0;
}}}
//Iterate over the new array
for(n=0;n<5;n++)
printf("%d\n",d[n]);
return 0;
}

One way to improve to O(n log n) is to sort all three arrays first.
Then use three pointers one for each array. You always move the one that points to the lowest value and after every such move check whether the three values are the same.
To improve even further you can use hashtable.
Iterate through the first array and put it's values in a hashtable as keys.
Then iterate through the second array and every time when the value exists as a key in the first hashtable, put it in a second one.
Finally iterate over the third array and if a value exists in the second hashtable as a key store it in the forth array. This is O(n) assuming the hashtable operations are O(1).

Your mistake is that you're using one of your three nested counters (which are being used to index the input arrays) as the index into the output array. You need to have a fourth index (let's call it n), which starts at zero, only increments every time a satisfactory value has been found.

Sort second and third arrays beforehand and use binary search on them to determine is some element is present.
If element is present in all of your arrays - it will present in the first. So, go through first (unsorted) array and check if its element is in second and third.
If you take the shortest array as the first - it will make algorithm slightly faster too.

You did't store them on d[] the right way.
Once found you can skip the rest of a[] and b[] for that element of c[].
#include <stdio.h>
main(){
int a[5]={1,3,6,7,8};
int b[5]={2,5,8,7,3};
int c[5]={4,7,1,3,6};
int i,j,k;
int n=0;
int found;
int d[5];
for(k=0; k<5; k++){
found=0;
for(j=0; j<5 && !found; j++){
if (b[j]==c[k]) {
for(i=0; i<5 && !found; i++){
if(a[i]==b[j]) {
d[n++]=c[k];
found=1;
}
}
}}}
//Iterate over the new array
for(i=0;i<5;i++)
printf("%d\n",d[i]);
return 0;
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Sorting columns(comparison lexicography) in matrix | C - c

Related

Find the indices of the k smallest values in C

How to sort an int array in linear time?

Effective Algorithms for selecting the top k ( in percent) items from a datastream:

C: Looping the big array efficiently

Storing numbers in array in C

Categories

Resources