removing duplicate integers from an array in C - c

I am working on a program that removes duplicate values from an array by ordering it and then removing duplicate, consecutive values. First I execute a selection sort sorting method, and then call a function removedup() that modifies the array and returns the size. Then I basically print the values in the array up to that size. However, when I execute it, it only prints the original array and then a bunch of blank space. Does anyone know why this is occurring?
My code: http://pastebin.com/uTTnnMHN
Just the de-duplication code:
int removedup(int a[])
{
int i, count, j;
count = n;
for (i = 0; i < (n - 1); i++) {
if (a[i] == a[i + 1]) {
for (j = 0; j < (n - i); j++) {
a[j] = a[j + 1];
}
count--;
i--;
}
}
return count;
}

-1 for(j=0;j<(n-i);j++)
Is your loop to shift left your array (thus removing the duplicate value), j should not be init to j but to i, and the condition does not seem right
A correct one could be:
for(j=i;j<n-1;j++)
{
a[j]=a[j+1];
}
a[n-1] = 0; // Because you shift your table to the left, the last value should not be used

first if i=0 and if a[i]==a[i+1] then i=-1
for(i=0;i<(n-1);i++)
{
if(a[i]==a[i+1])
{
for(j=0;j<(n-i);j++)
{
a[j]=a[j+1];
}
count--;
i--; //i=-1 if(a[i]==a[i+1]) && if(i==0)
}
}

In your duplicate removal function, you need to start the moving loop at i, as has been mentioned, and you must use count - 1 as the loop bound for both loops, otherwise you will have an infinite loop whenever there are duplicates, because then a[n-2] == a[n-1] always after the first moving loop.
int removedup(int a[])
{
int i, count, j;
count = n;
for(i = 0; i < (count-1); i++)
{
if(a[i] == a[i+1])
{
for(j = i; j < (count-1); j++)
{
a[j]=a[j+1];
}
count--;
i--;
}
}
return count;
}
works correctly.

Since you're creating another array anyway, why not simplify your function?
int removedup(int a[], int b[])
{
int i;
int count = 1;
b[0] = a[0]
for(i=1;i<(n-1);i++){
if(a[i-1] != a[i]){
b[count++] = a[i];
}
}
return count;
}
Then when you call the function,
count=removedup(a, OutArray);

int removedup(int a[])
{
int i;
count = n;
for (i = 0; i < (count-1); ++i) {
if (a[i] == a[i+1]) { /* found a duplicate */
int d = 0; /* count the duplicates */
do {
++d;
} while ((i+1+d) < count && a[i] == a[i+1+d]); /* probe ahead again for a duplicate */
/* at this point we have either run out of duplicates or hit the end of the array */
if ((i+1+d) < count)
memmove(&a[i+1], &a[i+1+d], sizeof(int)*(count-(i+1+d))); /* shift down rest of the array if there's still elements to look at */
count -= d; /* adjust the count down by the number of duplicates */
}
}
return count;
}

what about this one,without sort but traverse.finally print the effective values of the array and return its size.
int removedup(int a[])
{
int i, count, j;
count = n;
int b[n];
memset(b,0,sizeof(b));
for (i = 0; i < (n - 1); i++)
{
if (-1 != b[i])
{
for(j=i+1;j<n-1;j++)
{
if( a[i]==a[j])
{
b[j]=-1;
count--;
}
}
}
}
for(i=0;i<n-1;i++)
{
if(-1 != b[i])
{
printf("%d ",a[i]);
}
}
return count;
}

Related

Print elements of an array that appear only once (C)

I am having trouble achieving the wanted results. The program should ask for 20 inputs and then go over each to see if they appear more than once. Then only print out those that appeared once.
However currently my program prints out random numbers that are not inputted.
For example:
array = {10,10,11,12,10,10,10.....,10} should return 11 and 12
#include <stdio.h>
void main() {
int count, size=20, array[size], newArr[size];
int number=0;
for(count = 0; count < size; count++) {
// Ask user for input until 20 correct inputs.
printf("\nAnna %d. luku > ", count+1);
scanf("%d", &number);
if( (number > 100) || (number < 10) ) {
while(1) {
number = 0;
printf("Ei kelpaa.\n");//"Is not valid"
printf("Yrita uudelleen > ");//"Try again >"
scanf("%d", &number);
if ( (number <= 100) && (number >= 10) ) {
break;
}
}
}
array[count] = number;
}
for(int i=0; i < size; i++) {
for(int j=0; j<size; j++){
if(array[i] == array[j]){
size--;
break;
} else {
// if not duplicate add to the new array
newArr[i] == array[j];
}
}
}
// print out all the elements of the new array
for(int k=0; k<size; k++) {
printf("%d\n", newArr[k]);
}
}
You don't need the newArr here, or the separate output loop. Only keep a count that you reset to zero at the beginning of the outer loop, and increase in the inner loop if you find a duplicate.
Once the inner loop is finished, and the counter is 1 then you don't have any duplicates and you print the value.
In code perhaps something like:
for (unsigned i = 0; i < size; ++i)
{
unsigned counter = 0;
for (unsigned j = 0; j < size; ++j)
{
if (array[i] == array[j])
{
++counter;
}
}
if (counter == 1)
{
printf("%d\n", array[i]);
}
}
Note that the above is a pretty naive and brute-force way to deal with it, and that it will not perform very well for larger array sizes.
Then one could implement a hash-table, where the value is the key, and the count is the data.
Each time you read a value you increase the data for that value.
Once done iterate over the map and print all values whose data (counter) is 1.
Use functions!!
Use proper types for indexes (size_t).
void printdistinct(const int *arr, size_t size)
{
int dist;
for(size_t s = 0; s < size; s++)
{
int val = arr[s];
dist = 1;
for(size_t n = 0; n < size; n++)
{
if(s != n)
if(val == arr[n]) {dist = 0; break;}
}
if(dist) printf("%d ", val);
}
printf("\n");
}
int main(void)
{
int test[] = {10,10,11,12,10,10,10,10};
printdistinct(test, sizeof(test)/sizeof(test[0]));
fflush(stdout);
}
https://godbolt.org/z/5bKfdn9Wv
This is how I did it and it should work for your:
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <stdarg.h>
void printdistinct(const int *muis, size_t size);
int main()
{
int loop=20,i,muis[20],monesko=0;
for(i=0; i<loop; i++){
monesko++;
printf ("Anna %d. luku: \n",monesko);
scanf("%d", &muis[i]);
if (muis[i]<10 || muis[i]>100){
printf("Ei kelpaa!\n");
muis[i] = muis[i + 1];
printf("YRITÄ UUDELLEEN:\n ");
scanf("%d", &muis[i]);
}
}
printdistinct(muis, sizeof(muis)/sizeof(muis[0]));
fflush(stdout);
return 0;
}
void printdistinct(const int *muis, size_t size)
{
for(size_t s = 0; s < size; s++)
{
int a = muis[s];
int testi = 1;
for(size_t n = 0; n < size; n++){
if(s != n) {
if(a == muis[n]){
testi = 0;
break;
}
}
}
if(testi) {
printf("%d \n", a);
}
testi = 1;
}
printf("\n");
}
This approach uses some memory to keep track of which elements are duplicates. The memory cost is higher, but the processor time cost is lower. These differences will become significant at higher values of size.
char* duplicate = calloc(size, 1); // values are initialized to zero
for (unsigned i = 0; i < size; ++i)
{
if(!duplicate[i]) // skip any value that's known to be a duplicate
{
for (unsigned j = i + 1; j < size; ++j) // only look at following values
{
if (array[i] == array[j])
{
duplicate[i] = 1;
duplicate[j] = 1; // all duplicates will be marked
}
}
if (!duplicate[i])
{
printf("%d\n", array[i]);
}
}
}
What you can do is you can initialize a hashmap that will help you store the unique elements. Once you start iterating the array you check for that element in the hashmap. If it is not present in the hashmap add it to the hashmap. If it is already present keep iterating.
This way you would not have to iterate the loop twice. Your time complexity of the algorithm will be O(n).
unordered_map < int, int > map;
for (int i = 0; i < size; i++) {
// Check if present in the hashmap
if (map.find(arr[i]) == map.end()) {
// Insert the element in the hash map
map[arr[i]] = arr[i];
cout << arr[i] << " ";
}
}

two strings of integers. Check if the second one is substring

#include <stdio.h>
int main() {
int x[100], y[100], i, j, dim1, dim2, subsir=0;
scanf("%d", &dim1);
for(i=0; i<dim1; i++)
{
scanf("%d", &x[i]);
}
scanf("%d", &dim2);
for(j=0; j<dim2; j++)
{
scanf("%d", &y[j]);
}
if(dim1<dim2)
{
printf("no");
return 0;
}
for(i=0; i<dim1; i++)
for(j=0; j<dim2; j++)
{
if(x[i]==y[j])
subsir=1;
}
if(subsir==1)
{
printf("yes");
}
else
{
printf("no");
}
return 0;
}
Consider two strings of integers. Check if the second string is a substring of the first in C.(the substring consists of consecutive elements of the first string)
You can achieve your goal with a brute force approach, trying every position from 0 to dim1 - dim2.
Here is a modified version with error checking:
#include <stdio.h>
int main() {
int x[100], y[100], i, j, dim1, dim2;
if (scanf("%d", &dim1) != 1 || dim1 > 100)
return 1;
for (i = 0; i < dim1; i++) {
if (scanf("%d", &x[i]) != 1)
return 1;
}
if (scanf("%d", &dim2) != 1 || dim2 > 100)
return 1;
/* no need to read second array if larger */
if (dim1 < dim2) {
printf("no\n");
return 0;
}
for (j = 0; j < dim2; j++) {
if (scanf("%d", &y[j]) != 1)
return 1;
}
/* trying every possible sub-array start in array `x` */
for (i = 0; i <= dim1 - dim2; i++) {
/* compare all entries */
for (j = 0; j < dim2; j++) {
if (x[i + j] != y[j])
break;
}
/* if above loop completed, we have a match */
if (j == dim2) {
printf("yes\n");
return 0;
}
}
printf("no\n");
return 0;
}
A possible implementation, based on your approach:
// this function will return `1` in case `arr2` is contained inside `arr1`, otherwise `0`
int is_subarr(int arr1[], size_t arr1_len, int arr2[], size_t arr2_len) {
// as you already did, we first check that arr1's len is not smaller than arr2's
if (arr1_len < arr2_len) {
return 0;
}
// useful check: in many implementations, when `arr2` is empty, they return true
if (arr2_len == 0) {
return 1;
}
// first loop: we go through all elements of `arr1` (but we stop when we have less elements left than `arr2`'s len)
for (size_t i = 0; i < arr1_len && i < arr2_len; i += 1) {
// temporary flag, we just pretend that `arr2` is contained, and then we check for the opposite
int eq = 1;
// starting with the current position `i` inside `arr1`, we start checking if `arr2` is contained
for (size_t j = 0; j < arr2_len; j += 1) {
// if it's not contained, we break from this inner loop, and we keep searching through `arr1`
if (arr1[i + j] != arr2[j]) {
eq = 0;
break;
}
}
// in case `arr2` is effectively found, we return `1`
if (eq == 1) {
return 1;
}
}
return 0;
}
NOTE: "strings of integers" are called "arrays of integers" in C; a "string" is an "array of chars", instead.

C - Remove Duplicates from an Array

I'm quite new to programming, I wrote a code to remove duplicates from an array, logically, it should work, but it doesn't.... I logically tested it multiple times and it made sense...
Here's the code:
#include <stdio.h>
int rmDuplicates(int arr[], int n)
{
int i, j;
for (i = 0; i < n; i++) {
if (arr[i] == arr[i + 1]) {
for (j = i + 1; j < n - 1; j++) {
arr[j] = arr[j + 1];
}
n--;
}
return n;
}
}
int main()
{
int n, i;
scanf("%d", &n);
int arr[n];
for (i = 0; i < n; i++) {
scanf("%d", &arr[i]);
}
n = rmDuplicates(arr, n);
for (i = 0; i < n; i++) {
printf("%d", arr[i]);
}
printf("\n%d", n);
return 0;
}
Your "return n" is in the wrong place, and returns after the first cycle.
for(i=0;i<n;i++) {
if(arr[i] == arr[i+1]) {
for(j=i+1;j<n-1;j++) {
arr[j] = arr[j+1];
}
n--;
}
return n; // <---- this
}
// <-- should be here.
As confirmation, if I move the return n; outside the loop, the code works. But it only removes consecutive duplicates, because you only check arr[i] against its consecutive, arr[i+1].
(Also, the cycle ought to stop at n-1, because otherwise arr[n-1+1] is arr[n] which is outside the array).
A final issue is that if you have, say,
n
...5,..., 5, 5, 6
i j
and you check the first 5 against the second, and find it a duplicate, then shift all that follows by one step, in the j-th position you will have a 5 again, but j will now be incremented and you will test the first 5 against the 6 instead of the third 5, not finding the duplicate:
n
...5,..., 5, 6
i j
For this reason, when you find a match, you need to rewind j by one and repeat that test:
int rmDuplicates(int arr[], int n) {
int i,j,k;
for (i=0;i<n-1;i++) {
for (j=i+1; j < n; j++) {
if(arr[i] == arr[j]) {
n--;
for (k=j;k<n;k++) {
arr[k] = arr[k+1];
}
j--;
}
}
}
return n;
}
From a performance point of view, the above algorithm is O(n^2), that is, if the array list doubles, the algorithm takes four times as long; if it trebles, it takes nine times as long.
A better algorithm would therefore be to first sort the array in-place, so that 1 3 2 7 2 3 5 becomes 1 2 2 3 3 5 7 (this has a cost of O(n log n), which grows more slowly ); then you just "compress" the array skipping duplicates, which is O(n) and gets you 1 2 3 5 7
int i, j;
for (i = 0, j = 1; j < n;) {
if (arr[i] == arr[j]) {
j++;
continue;
}
i++;
if (j != (i+1)) {
arr[i] = arr[j];
}
j++;
}
n = i+1;
size_t removeDups(int *arr, size_t size)
{
if(arr && size > 1)
{
for(size_t current = 0; current < size - 1; current++)
{
size_t original_size = size;
size_t copypos = current + 1;
for(size_t cpos = current + 1; cpos < original_size; cpos++)
{
if(arr[current] == arr[cpos])
{
if(cpos < original_size -1)
{
if(arr[current] != arr[cpos + 1])
{
arr[copypos++] = arr[cpos + 1];
cpos++;
}
}
size--;
}
else
{
arr[copypos++] = arr[cpos];
}
}
}
}
return size;
}
int main(void)
{
int arr[] = {1,1,1,2,2,3,3,4,5,6,7,1,8,8,2,2,2,2};
size_t size = sizeof(arr) / sizeof(arr[0]);
size = removeDups(arr, size);
for(size_t index = 0; index < size; index++)
{
printf("%d\n", arr[index]);
}
}

I want to print the number of times an element is repeated in an array of random numbers in C

Hi wanted to display repeated elements of a Random array whose size can be specified by the user. The problem I am getting in the output is , the function is printing a repeated number as many times as it has been repeated but I want to print it only once.
Here is my code and output following the former:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
int array_size = 0;
int *my_array;
int i = 0;
printf("Enter the size of the array:\n");
scanf("%d",&array_size);
my_array = malloc(array_size * sizeof my_array[0]);
if(NULL == my_array) {
fprintf(stderr,"MEMORY ALLOLCATION FAILED \n");
return EXIT_FAILURE;
}
for(i=0;i<array_size;i++){
my_array[i] = rand()%array_size;
}
printf("What's in the array:\n");
for(i = 0;i<array_size;i++){
printf("%d ",my_array[i]);
}
printf("\n");
display_repeats(my_array, array_size);
free(my_array);
return EXIT_SUCCESS;
}
void display_repeats(int *a,int n){
int *repeats;
repeats = malloc(n * sizeof repeats[0]);
int i=0;
int j=0;
int count = 0;
for(i=0;i<n;i++){
for(j=0;j<n;j++){
if(a[i] == a[j]){
count++;
}
}
if(count>1){
repeats[i] = count;
printf("%d occurs %d times\n",a[i],repeats[i]);
}
count = 0;
}
free(repeats);
}
Here is the output I am getting
Enter the size of the array:
5
What's in the array:
3 1 2 0 3
3 occurs 2 times
3 occurs 2 times
I want "3 occurs 2 times" to print once.
Please help!
What's happening is that your code realizes 3 is repeated twice, in this block:
for(i=0;i<n;i++){
for(j=0;j<n;j++){
if(a[i] == a[j]){
count++;
}
}
if(count>1){
repeats[i] = count;
printf("%d occurs %d times\n",a[i],repeats[i]);
}
}
When i is equal to 0, it will look at the array and realize that 3 is repeated, so count>1 will be true. Then, when i is equal to 4, count>1 will be true again, and you get the double print.
In order to fix this, I would create an array that stores the numbers that have already been verified as repeated and check against that.
Since in your case, value of elements < size_array, so can take this approach.
memset(repeats, 0, n*sizeof(repeats[0]));
for(i=0;i<n;i++){
for(j=0;j<n;j++){
count = repeats[a[i]];
if(count>0)
break;
if(a[i] == a[j]){
count++;
}
}
repeats[a[i]] = count;
The idea is to store the count for each value and check before starting search for each new value to check if it has already been counted.
Try it:
for(i=0;i<n-1;i++)
{
for(j=1;j<n;j++)
{
So, don't compare the same one element with itself.
Thanks guys I got it working with the above ideas!
here is what I got..
#include <stdio.h>
#include <stdlib.h>
int main(void) {
int array_size = 0;
int *my_array;
int i = 0;
printf("Enter the size of the array:\n");
scanf("%d",&array_size);
my_array = malloc(array_size * sizeof my_array[0]);
if(NULL == my_array) {
fprintf(stderr,"MEMORY ALLOLCATION FAILED \n");
return EXIT_FAILURE;
}
for(i=0;i<array_size;i++){
my_array[i] = rand()%array_size;
}
printf("What's in the array:\n");
for(i = 0;i<array_size;i++){
printf("%d ",my_array[i]);
}
printf("\n");
display_repeats(my_array, array_size);
free(my_array);
return EXIT_SUCCESS;
}
void display_repeats(int *a,int n){
int *freq;
freq = malloc(n * sizeof freq[0]);
int i=0;
int j=0;
int count = 0;
for(i=0;i<n;i++){
freq[i] = -1;
}
for(i=0;i<n;i++){
count = 1;
for(j=i+1;j<n;j++){
if(a[i]==a[j]){
count++;
freq[j] = 0;
}
}
if(freq[i]!=0){
freq[i] = count;
}
}
for(i=0;i<n;i++){
if(freq[i]!=1 && freq[i]!=0){
printf("%d occurs %d times\n",a[i],freq[i]);
}
}
free(freq);
}
In display_repeats(), you may simply update 2 lines and add 3 lines to make your program have the correct behaviour. And we will see it later: we can optimize the final code using C99 syntax and the comma C operator, to write far less lines than your original code (9 lines instead of 20 lines!).
So, in this answer, you will find how to get the final code that has the correct behaviour and is very short, shorter than your initial code:
void display_repeats(int *a, int n) {
for (int i = 0; i < n; i++) {
if (a[i] == -1) continue;
int count = 1;
for (int j = i + 1; j < n; j++)
if (a[i] == a[j]) count++, a[j] = -1;
if (count > 1) printf("%d occurs %d times\n", a[i], count);
}
}
The idea is to set to -1 each array value that has matched a previous one, just after the count increment. Because you do not want to count this value again.
So simply do the following:
In the two lines where you write count = 0, replace 0 by 1, because you are sure that each number in the list must be counted at least one time.
This way, you can avoid checking the case where i equals j in the inner loop: it is already taken into account in count. So add if (i == j) continue; at the beginning of the inner loop.
With the previous updates, you are now sure that when you increment count, in the inner loop, j is not equal to i. Therefore, you can change the value of a[j] without changing a[i], in the array.
So, add a[j] = -1; just after having increased count. This way, when i will be incremented to check for a new count of a new value, it is impossible that the new counted value has already been counted.
Finally, you do not want to count how many times -1 is in the array. But you have replaced some values with -1. So simply add if (a[i] == -1) continue; at the beginning of the outer loop to avoid counting how -1 there are in the array.
This intermediate code is:
void display_repeats(int *a,int n) {
int *repeats;
repeats = malloc(n * sizeof repeats[0]);
int i=0;
int j=0;
int count = 1;
for(i=0; i<n; i++) {
if (a[i] == -1) continue;
for(j=0; j<n; j++) {
if (i == j) continue;
if(a[i] == a[j]) {
count++;
a[j] = -1;
}
}
if(count > 1) {
repeats[i] = count;
printf("%d occurs %d times\n",a[i],repeats[i]);
}
count = 1;
}
free(repeats);
}
Now, we can optimize this code.
First, we can avoid testing a[i] == a[j] when j <= i: if such a case happens, we know that we have previously displayed the count for a[j] (the number of times a[j] appears in the array). Therefore we can replace for (j=0; j < n; j++) { by for (j=i+1; j<n; j++) {.
With this last update, we know that the first line of the inner loop will never match: i can not be equal to j. So we can remove this line (if (i == j) continue;).
Note that the values stored in the repeats array are only used to get the value of count in the printf() call. So we can remove every reference to the repeats array and simply use count in the printf() call.
Now, you can see that we set count to 1, two times. We can do it only one time, if we do not set it to 1 at the end of the main loop, to prepare a new loop, but at the beginning.
Now, note that i, j and count have the same type, so only one line may be used to define them: int i = 0, j = 0, count = 1;. More over, i and j are defined later in the for loops, so no need to define their initial value here. So we can simply write int i, j, count = 1;. But count is now defined at the beginning of the outer loop, so we do not need to define it previously. So we only need to define i, j and count without an initial value: int i, j, count;.
The new intermediate code is:
void display_repeats(int *a, int n) {
int i, j, count;
for (i = 0; i < n; i++) {
if (a[i] == -1) continue;
count = 1;
for(j = i + 1; j < n; j++) {
if (a[i] == a[j]) {
count++;
a[j] = -1;
}
}
if (count > 1) printf("%d occurs %d times\n", a[i], count);
}
}
Using C99 syntax
But we can do more, using the C99 specification: we can define a variable with the for instruction: we can write for (int i = ...). No more need to define it previously. So we can avoid having to write int i, j, count;, we will define them at first use:
void display_repeats(int *a, int n) {
for (int i = 0; i < n; i++) {
if (a[i] == -1) continue;
int count = 1;
for(int j = i + 1; j < n; j++) {
if (a[i] == a[j]) {
count++;
a[j] = -1;
}
}
if (count > 1) printf("%d occurs %d times\n", a[i], count);
}
}
Using the comma C operator
Again, we can do much better! We can use the comma operator (,): it is a binary operator that evaluates its first operand, discards the result, evaluates the second operand and returns its value. Using the comma operator, we can transform count++; a[j] = -1; to a single instruction: a[j] = (count++, -1). But we can avoid the parenthesis writing count++, a[i] = -1. Now, we do not need a block for the if statement, since there is only one instruction. Therefore, we can remove a lot of parenthesis.
The final code is:
void display_repeats(int *a, int n) {
for (int i = 0; i < n; i++) {
if (a[i] == -1) continue;
int count = 1;
for (int j = i + 1; j < n; j++)
if (a[i] == a[j]) count++, a[j] = -1;
if (count > 1) printf("%d occurs %d times\n", a[i], count);
}
}

Array sorting and removing duplicates

I am trying to sort an array and remove duplicates.This is the function i am using in c
this code is errorfree but gives me wrong output as there are 0's in the output array. whereas there were no zeros originally
sort(int tab[], int k)
{
int temp,i,j,m;
for(i=0; i<k; i++){
for(j =i+1; j<k; j++)
{
if(tab[i] > tab[j])
{
int temp = tab[i];
tab[i]=tab[j];
tab[j]=temp;
}
else if (tab[i] == tab[j]){
for (m =j; m<k; m++){
tab[m] = tab[m+1];
}
}
}
}
}
what is the logical error in this code?I am getting 0's in my output
your code is correct... but you forgot to do a k--;
since you are removing a duplicate each time the size will decrease by 1..
sort(int tab[], int k)
{
int temp,i,j,m;
for( i=0;i<k;i++)
for( j =i+1;j<k;j++)
{
if(tab[i]>tab[j])
{
int temp = tab[i];
tab[i]=tab[j];
tab[j]=temp;
}
else if (tab[i]==tab[j])
{
for ( m =j ; m<k;m++)
tab[m]=tab[m+1];
k--;
} //end of else if
}// end of for
}

Resources