Finding no. of shifts in Insertion sort for large inputs in C - c

I'm trying to write a program that counts the number of swaps made by insertion sort. My program works on small inputs, but produces the wrong answer on large inputs. I'm also not sure how to use the long int type.
This problem came up in a setting described at https://drive.google.com/file/d/0BxOMrMV58jtmNF9EcUNQNGpreDQ/edit?usp=sharing
Input is given as
The first line contains the number of test cases T. T test cases follow.
The first line for each case contains N, the number of elements to be sorted.
The next line contains N integers a[1],a[2]...,a[N].
Code I used is
#include <stdio.h>
#include <stdlib.h>
int insertionSort(int ar_size,int * ar)
{
int i,j,t,temp,count;
count=0;
int n=ar_size;
for(i=0;i<n-1;i++)
{
j=i;
while(ar[j+1]<ar[j])
{
temp=ar[j+1];
ar[j+1]=ar[j];
ar[j]=temp;
j--;
count++;
}
}
return count;
}
int main()
{
int _ar_size,tc,i,_ar_i;
scanf("%d", &tc);
int sum=0;
for(i=0;i<tc;i++)
{
scanf("%d", &_ar_size);
int *_ar;
_ar=(int *)malloc(sizeof(int)*_ar_size);
for(_ar_i = 0; _ar_i < _ar_size; _ar_i++)
{
scanf("%d", &_ar[_ar_i]);
}
sum=insertionSort(_ar_size, _ar);
printf("%d\n",sum);
}
return 0;
}

There are two issues that I currently see with the solution you have.
First, there's an issue brought up in the comments about integer overflow. On most systems, the int type can hold numbers up through 231 - 1. In insertion sort, the number of swaps that need to be made in the worst case on an array of length n is n(n - 1) / 2 (details later), so for an array of size 217, you may end up not being able to store the number of swaps that you need inside an int. To address this, consider using a larger integer type. For example, the uint64_t type can store numbers up to roughly 1018, which should be good enough to store the answer for arrays up to length around 109. You mentioned that you're not sure how to use it, but the good news is that it's not that hard. Just add the line
#include <stdint.h>
(for C) or
#include <cstdint>
(for C++) to the top of your program. After that, you should just be able to use uint64_t in place of int without making any other modifications and everything should work out just fine.
Next, there's an issue of efficiency. The code you've posted essentially runs insertion sort and therefore takes time O(n2) in the worst-case. For large inputs - say, inputs around size 108 - this is prohibitively expensive. Amazingly, though, you can actually determine how many swaps insertion sort will make without actually running insertion sort.
In insertion sort, the number of swaps made is equal to the number of inversions that exist in the input array (an inversion is a pair of elements that are out of order). There's a beautiful divide-and-conquer algorithm for counting inversions that runs in time O(n log n), which likely will scale up to work on much larger inputs than just running insertion sort. I think that the "best" answer to this question would be to use this algorithm, while taking care to use the uint64_t type or some other type like it, since it will make your algorithm work correctly on much larger inputs.

Related

Processes for Insertion sort

I've been leariing sorting algorithms for couple of days. Presently i'm doing Insertion Sort. So the general algorithm is:
void insertionSort(int N, int arr[]) {
int i,j;
int value;
for(i=1;i<N;i++)
{
value=arr[i];
j=i-1;
while(j>=0 && value<arr[j])
{
arr[j+1]=arr[j];
j=j-1;
}
arr[j+1]=value;
}
for(j=0;j<N;j++)
{
printf("%d ",arr[j]);
}
printf("\n");
}
Now i've done this:
void print_array(int arr_count, int* arr){
int i;
for (i=0;i<arr_count;i++){
printf("%d ",arr[i]);
}
printf("\n");
}
void swap(int* m, int* n){
int t = 0;
t = *m;
*m = *n;
*n = t;
}
void insertionSort(int arr_count, int* arr) {
int i, j;
for(i = 0;i<arr_count;i++){
for (j=0;j<i;j++){
if (arr[i] < arr[j]){
swap(arr+i, arr+j);
}
}
//if (i!=0)
//print_array(arr_count, arr);
}
print_array(arr_count, arr);
}
Now, my question is whats the diffrence between my custom approach and the traditional appraoch.Both have N2 complexity....
Please help..
Thanks in advance
At each iteration, the original code you present moves each element into place by moving elements in a cycle. For an n-element cycle, that involves n+1 assignments.
It is possible to implement Insertion Sort by moving elements with pairwise swaps instead of in larger cycles. It is sometimes taught that way, in fact. This is possible because any permutation (not just cycles) can be expressed as a series of swaps. Implementing an n-element cycle via swaps requires n-1 swaps, and each swap, being a 2-element cycle, requires 2+1 = 3 assignments. For cycles larger than two elements, then, the approach using pairwise swaps does more work, scaling as 3*(n-1) as opposed to n+1. That does not change the asymptotic complexity, however, as you can see by the fact that the exponent of n does not change.
But note another key difference between the original code and yours: the original code scans backward through the list to find the insertion position, whereas you scan forward. Whether you use pairwise swaps or a larger cycle, scanning backward has the advantage that you can perform the needed reordering as you go, so that once you find the insertion position, you are done. This is one of the things that makes Insertion Sort so good among comparison sorts, and why it is especially fast for inputs that are initially nearly sorted.
Scanning forward means that once you find the insertion position, you've only started. You then have to cycle the elements. As a result, your approach examines every element of the sorted array head on every iteration. Additionally, when it actually performs the reordering, it does a bunch of unneeded comparisons. It could instead use the knowledge that the head of the list started sorted, and just perform a cycle (either way) without any more comparisons. The extra comparisons disguise the fact that the code is just performing the appropriate element cycling at that point (did you realize that?) and it's probably why several people mistook your implementation for a Bubble Sort.
Technically, yours is still an Insertion Sort, but it is an implementation that takes no advantage of the characteristics of the abstract Insertion Sort algorithm that give well-written implementations an advantage over other sorts of the same asymptotic complexity.
The main difference between insertion sort algorithm and your custom algorithm is the direction of processing.The insertion sort algorithm is moving one by one the smaller elements in range to the left side while your algorithm is one by one moving the larger elements in range to the right side.
Another key difference is in the best case time complexity of insertion sort and your algorithm.
The insertion sort stops if the value < arr[j] is not satisfying so it have the best case complexity of O(n){when the array is already sorted} while your algorithm always searches from index 0 to j so it takes O(n^2) steps even when the array is already sorted.

Fastest algorithm to figure out if an array has at least one duplicate

I have a quite peculiar case here. I have a file containing several million entries and want to find out if there exists at least one duplicate. The language here isn't of great importance, but C seems like a reasonable choice for speed. Now, what I want to know is what kind of approach to take to this? Speed is the primary goal here. Naturally, we want to stop looking as soon as one duplicate is found, that's clear, but when the data comes in, I don't know anything about how it's sorted. I just know it's a file of strings, separated by newline. Now keep in mind, all I want to find out is if a duplicate exists. Now, I have found a lot of SO questions regarding finding all duplicates in an array, but most of them go the easy and comprehensive way, rather than the fastest.
Hence, I'm wondering: what is the fastest way to find out if an array contains at least one duplicate? So far, the closest I've been able to find on SO is this: Finding out the duplicate element in an array. The language chosen isn't important, but since it is, after all, programming, multi-threading would be a possibility (I'm just not sure if that's a feasible way to go about it).
Finally, the strings have a format of XXXNNN (3 characters and 3 integers).
Please note that this is not strictly theoretical. It will be tested on a machine (Intel i7 with 8GB RAM), so I do have to take into consideration the time of making a string comparison etc. Which is why I'm also wondering if it could be faster to split the strings in two, and first compare the integer part, as an int comparison will be quicker, and then the string part? Of course, that will also require me to split the string and cast the second half to an int, which might be slower...
Finally, the strings have a format of XXXNNN (3 characters and 3 integers).
Knowing your key domain is essential to this sort of problem, so this allows us to massively simplify the solution (and this answer).
If X &in; {A..Z} and N &in; {0..9}, that gives 263 * 103 = 17,576,000 possible values ... a bitset (essentially a trivial, perfect Bloom filter with no false positives) would take ~2Mb for this.
Here you go: a python script to generate all possible 17 million keys:
import itertools
from string import ascii_uppercase
for prefix in itertools.product(ascii_uppercase, repeat=3):
for numeric in range(1000):
print "%s%03d" % (''.join(prefix), numeric)
and a simple C bitset filter:
#include <limits.h>
/* convert number of bits into number of bytes */
int filterByteSize(int max) {
return (max + CHAR_BIT - 1) / CHAR_BIT;
}
/* set bit #value in the filter, returning non-zero if it was already set */
int filterTestAndSet(unsigned char *filter, int value) {
int byteIndex = value / CHAR_BIT;
unsigned char mask = 1 << (value % CHAR_BIT);
unsigned char byte = filter[byteIndex];
filter[byteIndex] = byte | mask;
return byte & mask;
}
which for your purposes you'd use like so:
#include <stdlib.h>
/* allocate filter suitable for this question */
unsigned char *allocMyFilter() {
int maxKey = 26 * 26 * 26 * 10 * 10 * 10;
return calloc(filterByteSize(maxKey), 1);
}
/* key conversion - yes, it's horrible */
int testAndSetMyKey(unsigned char *filter, char *s) {
int alpha = s[0]-'A' + 26*(s[1]-'A' + 26*(s[2]-'A'));
int numeric = s[3]-'0' + 10*(s[4]-'0' + 10*(s[5]-'0'));
int key = numeric + 1000 * alpha;
return filterTestAndSet(filter, key);
}
#include <stdio.h>
int main() {
unsigned char *filter = allocMyFilter();
char key[8]; /* 6 chars + newline + nul */
while (fgets(key, sizeof(key), stdin)) {
if (testAndSetMyKey(filter, key)) {
printf("collision: %s\n", key);
return 1;
}
}
return 0;
}
This is linear, although there's obviously scope to optimise the key conversion and file input. Anyway, sample run:
useless:~/Source/40044744 $ python filter_test.py > filter_ok.txt
useless:~/Source/40044744 $ time ./filter < filter_ok.txt
real 0m0.474s
user 0m0.436s
sys 0m0.036s
useless:~/Source/40044744 $ cat filter_ok.txt filter_ok.txt > filter_fail.txt
useless:~/Source/40044744 $ time ./filter < filter_fail.txt
collision: AAA000
real 0m0.467s
user 0m0.452s
sys 0m0.016s
admittedly the input file is cached in memory for these runs.
The reasonable answer is to keep the algorithm with the smallest complexity. I encourage you to use a HashTable to keep track of inserted elements; the final algorithm complexity is O(n), because search in HashTable is O(1) theoretically. In your case I suggest you, to run the algorithm when reading file.
public static bool ThereAreDuplicates(string[] inputs)
{
var hashTable = new Hashtable();
foreach (var input in inputs)
{
if (hashTable[input] != null)
return true;
hashTable.Add(input, string.Empty);
}
return false;
}
A fast but inefficient memory solution would use
// Entries are AAA####
char found[(size_t)36*36*36*36*36*36 /* 2,176,782,336 */] = { 0 }; // or calloc() this
char buffer[100];
while (fgets(buffer, sizeof buffer, istream)) {
unsigned long index = strtoul(buffer, NULL, 36);
if (found[index]++) {
Dupe_found();
break;
}
}
The trouble with the post is that it wants "Fastest algorithm", but does not detail memory concerns and its relative importance to speed. So speed must be king and the above wastes little time. It does meet the "stop looking as soon as one duplicate is found" requirement.
Depending on how many different things there can be you have some options:
Sort whole array and then lookup for repeating element, complexity O(n log n) but can be done in place, so memory will be O(1)
Build set of all elements. Depending on chosen set implementation can be O(n) (when it will be hash set) or O(n log n) (binary tree), but it would cost you some memory to do so.
The fastest way to find out if an array contains at least one duplicate is to use a bitmap, multiple CPUs and an (atomic or not) "test and set bit" instruction (e.g. lock bts on 80x86).
The general idea is to divide the array into "total elements / number of CPUs" sized pieces and give each piece to a different CPU. Each CPU processes it's piece of the array by calculating an integer and doing the atomic "test and set bit" for the bit corresponding to that integer.
However, the problem with this approach is that you're modifying something that all CPUs are using (the bitmap). A better idea is to give each CPU a range of integers (e.g. CPU number N does all integers from "(min - max) * N / CPUs" to "(min - max) * (N+1) / CPUs"). This means that all CPUs read from the entire array, but each CPU only modifies it's own private piece of the bitmap. This avoids some performance problems involved with cache coherency protocols ("read for ownership of cache line") and also avoids the need for atomic instructions.
Then next step beyond that is to look at how you're converting your "3 characters and 3 digits" strings into an integer. Ideally, this can/would be done using SIMD; which would require that the array is in "structure of arrays" format (and not the more likely "array of structures" format). Also note that you can convert the strings to integers first (in an "each CPU does a subset of the strings" way) to avoid the need for each CPU to convert each string and pack more into each cache line.
Since you have several million entries I think the best algorithm would be counting sort. Counting sort does exactly what you asked: it sorts an array by counting how many times every element exists. So you could write a function that does the counting sort to the array :
void counting_sort(int a[],int n,int max)
{
int count[max+1]={0},i;
for(i=0;i<n;++i){
count[a[i]]++;
if (count[a[i]]>=2) return 1;
}
return 0;
}
Where you should first find the max element (in O(n)). The asymptotic time complexity of counting sort is O(max(n,M)) where M is the max value found in the array. So because you have several million entries if M has size order of some millions this will work in O(n) (or less for counting sort but because you need to find M it is O(n)). If also you know that there is no way that M is greater than some millions than you would be sure that this gives O(n) and not just O(max(n,M)).
You can see counting sort visualization to understand it better, here:
https://www.cs.usfca.edu/~galles/visualization/CountingSort.html
Note that in the above function we don't implement exactly counting sort, we stop when we find a duplicate which is even more efficient, since yo only want to know if there is a duplicate.

What kind of drawbacks are there performance-wise , if I sort an array by using hashing?

I wrote a simple function to sort an array int a[]; using hash.
For that I stored frequency for every element in new array hash1[] and then I put back in original array in linear time.
#include<bits/stdc++.h>
using namespace std;
int hash1[10000];
void sah(int a[],int n)
{
int maxo=-1;
for(int i=0;i<n;i++)
{
hash1[a[i]]++;
if(maxo<a[i]){maxo=a[i];}
}
int i=0,freq=0,idx=0;
while(i<maxo+1)
{
freq=hash1[i];
if(freq>0)
{
while(freq>0)
{
a[idx++]=i;freq--;
}
}
i++;
}
}
int main()
{
int a[]={6,8,9,22,33,59,12,5,99,12,57,7};
int n=sizeof(a)/sizeof(a[0]);
sah(a,n);
for(int i=0;i<n;i++)
{
printf("%d ",a[i]);
}
}
This algorithm runs in O(max_element). What kind of disadvantages I'm facing here considering only performance( time and space)?
The algorithm you've implemented is called counting sort. Its runtime is O(n + U), where n is the total number of elements and U is the maximum value in the array (assuming the numbers go from 0 to U), and its space usage is Θ(U). Your particular implementation assumes that U = 10,000. Although you've described your approach as "hashing," this really isn't a hash (computing some function of the elements and using that to put them into buckets) as a distribution (spreading elements around according to their values).
If U is a fixed constant - as it is in your case - then the runtime is O(n) and the space usage is O(1), though remember that big-O talks about long-term growth rates and that if U is large the runtime can be pretty high. This makes it attractive if you're sorting very large arrays with a restricted range of values. However, if the range of values can be large, this is not a particularly good approach. Interestingly, you can think of radix sort as an algorithm that repeatedly runs counting sort with U = 10 (if using the base-10 digits of the numbers) or U = 2 (if going in binary) and has a runtime of O(n log U), which is strongly preferable for large values of U.
You can clean up this code in a number of ways. For example, you have an if statement and a while loop with the same condition, which can be combined together into a single while loop. You also might want to put in some assert checks to make sure all the values are in the range from 0 to 9,999, inclusive, since otherwise you'll have a bounds error. Additionally, you could consider making the global array either a local variable (though watch your stack usage) or a static local variable (to avoid polluting the global namespace). You could alternatively have the user pass in a parameter specifying the maximum size or could calculate it yourself.
Issues you may consider:
Input validation. What if the user enters -10 or a very large value.
If the maximum element is large, you will at some point get a performance hit when the L1 cache is exhausted. The hash1-array will compete for memory bandwidth with the a-array. When I have implemented radix-sorting in the past I found that 8-bits per iteration was fastest.
The time complexity is actually O(max_element + number_of_elements). E.g. what if you sorted 2 million ones or zeros. It is not as fast as sorting 2 ones or zeros.

Homework: Creating O(n) algorithm for sorting

I am taking the cs50 course on edx and am doing the hacker edition of pset3 (in essence it is the advanced version).
Basically the program takes a value to be searched for as the command-line argument, and then asks for a bunch of numbers to be used in an array.
Then it sorts and searches that array for the value entered at the command-line.
The way the program is implemented, it uses a pseudo-random number generator to feed the numbers for the array.
The task is to write the search and sorting functions.
I already have the searching function, but the sorting function is supposed to be O(n).
In the regular version you were supposed to use a O(n ^ 2) algorithm which wasn't a problem to implement. Also using a log n algorithm wouldn't be an issue either.
But the problem set specifically ask's for a big O(n) algorithm.
It gives a hint in saying that, since no number in the array is going to be negative, and the not greater than LIMIT (the numbers output by the generator are modulo'd so they are not greater than 65000). But how does that help in getting the algorithm to be O(n)?
But the counting sort algorithm, which purports to be an acceptable solution, returns a new sorted array rather than actually sort the original one, and that contradicts with the pset specification that reads 'As this return type of void implies, this function must not return a sorted array; it must instead "destructively" sort the actual array that it’s passed by moving around the values therein.'
Also, if we decide to copy the sorted array onto the original one using another loop, with so many consecutive loops, I'm not sure if the sorting function can be considered to have a running time of O(n) anymore. Here is the actual pset, the question is about the sorting part.
Any ideas to how to implement such an algorithm would be greatly appreciated. It's not necessary to provide actual code, rather just the logic of you can create a O(n) algorithm under the conditions provided.
It gives a hint in saying that, since no number in the array is going
to be negative, and the not greater than LIMIT (the numbers outputted
by the generator are modulo'd to not be higher than 65000). But how
does that help in getting the algorithm to be O(n).
That hint directly seems to point towards counting sort.
You create 65000 buckets and use them to count the number of occurrences of each number.
Then, you just revisit the buckets and you have the sorted result.
It takes advantage of the fact that:
They are integers.
They have a limited range.
Its complexity is O(n) and as this is not a comparison-based sort, the O(nlogn) lower bound on sorting does not apply. A very good visualization is here.
As #DarkCthulhu said, counting sort is clearly what they were urging you to use. But you could also use a radix sort.
Here is a particularly concise radix 2 sort that exploits a nice connection to Gray codes. In your application it would require 16 passes over the input, one per data bit. For big inputs, the counting sort is likely to be faster. For small ones, the radix sort ought to be fster because you avoid initializing 256K bytes or more of counters.
See this article for explanation.
void sort(unsigned short *a, int len)
{
unsigned short bit, *s = a, *d = safe_malloc(len * sizeof *d), *t;
unsigned is, id0, id1;
for (bit = 1; bit; bit <<= 1, t = s, s = d, d = t)
for (is = id0 = 0, id1 = len; is < len; ++is)
if (((s[is] >> 1) ^ s[is]) & bit)
d[--id1] = s[is];
else
d[id0++] = s[is];
free(d);
}

Counting unique element in large array

One of my colleagues was asked this question in an interview.
Given a huge array which stores unsigned int. Length of array is 100000000. Find the effective way to count the unique number of elements present in the array.
E.g arr = {2,34,5,6,7,2,2,5,1,34,5}
O/p: Count of 2 is 3, Count of 34 is 2 and so on.
What are effective algorithms to do this? I thought at first dictionary/hash would be one of the options, but since the array is very large it is inefficient. Is there any way to do this?
Heap sort is O(nlogn) and in-place. In-place is necessary when dealing with large data sets. Once sorted you can make one pass through the array tallying occurrences of each value. Because the array is sorted, once a value changes you know you've seen all occurrences of the previous value.
Many other posters have suggested sorting the data and then finding the number of adjacent values, but no one has mentioned using radix sort yet to get the runtime to be O(n lg U) (where U is the maximum value in the array) instead of O(n lg n). Since lg U = O(lg n), assuming that integers take up one machine word, this approach is asymptotically faster than heapsort.
Non-comparison sorts are always fun in interviews. :-)
Sort it, then scan it from the beginning to determine the counts for each item.
This approach requires no additional storage, and can be done in O(n log n) time (for the sort).
If the range of the int values is limited, then you may allocate an array, which serves to count the occurrences for each possible value. Then you just iterate through your huge array and increment the counters.
foreach x in huge_array {
counter[x]++;
}
Thus you find the solution in linear time (O(n)), but at the expense of memory consumption. That is, if your ints span the whole range allowed by 32-bit ints, you would need to allocate an array of 4G ints, which is impractical...
How about using a BloomFilter impl: like http://code.google.com/p/java-bloomfilter/
first do a bloom.contains(element) if true continue if false bloom.add(element).
At the end count the number of elements added. Bloomfilter needs approx. 250mb memory to store 100000000 elements at 10bits per element.
Problem is that false positives are possible in BloomFilters and can only be minimized by increasing the number of bits per element. This could be addressed by two BloomFilters with different hashing that need to agree.
Hashing in this case is not inneficient. The cost will be approximately O(N) (O(N) for iterating over the array and ~O(N) for iterating over the hashtable). Since you need O(N) for checking each element, the complexity is good.
Sorting is a good idea. However type of sorting depends on range of possible values. For small range counting sort would be good. While dealing with such a big array it would be efficient to use multiple cores - radix sort might be good.
Look at its variation that might help you to find no. of distinct elements.
#include <bits/stdc++.h>
using namespace std;
#define ll long long int
#define ump unordered_map
void file_i_o()
{
ios_base::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);
#ifndef ONLINE_JUDGE
freopen("input.txt", "r", stdin);
freopen("output.txt", "w", stdout);
#endif
}
int main() {
file_i_o();
ll t;
cin>>t;
while(t--)
{
int n,q;
cin>>n>>q;
ump<int,int> num;
int x;
int arr[n+1];
int a,b;
for(int i=1;i<=n;i++)
{
cin>>x;
arr[i]=x;
num[x]++;
}
for(int i=0;i<q;i++)
{
cin>>a>>b;
num[arr[a]]--;
if((num[arr[a]])==0)
{ num.erase(arr[a]); }
arr[a]=b;
num[b]++;
cout<<num.size()<<"\n";
}
}
return 0;
}

Resources