Optimal method for creating a 2D Array without any repetitions - c

I'm trying to create some code to fish out records from a list of about 200k to 1million records. Obviously, I would want this process to be as fast as possible. The basic idea is as follows, the records in the large list are a combination of numbers which are to be kept together. For Example:
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,400076,400097,800076,800097
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,200032,200078,500032,500078
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,300043,300083,600043,600083
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,600026,600077,900026,900077
0,0,0,0,0,0,0,0,0,0,0,0,0,0,100008,100028,400028,400056,600008,600056
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,400042,400098,500042,500098
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,86,500015,500086
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,400013,400076,800013,800076
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,700024,700083,900024,900083
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,100003,100047,800003,800047
The maximum length of the record is 20 which is why the additional zeroes. Let's not worry about these for a moment. So, I want to "fish" out some records such that no repetitions are observed. If one repetition is there, I can discard that record and no longer look at it further. Thus, I must compile a list which looks like this:
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,400076,400097,800076,800097
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,200032,200078,500032,500078
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,300043,300083,600043,600083
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,600026,600077,900026,900077
0,0,0,0,0,0,0,0,0,0,0,0,0,0,100008,100028,400028,400056,600008,600056
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,400042,400098,500042,500098
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,86,500015,500086
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,700024,700083,900024,900083
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,100003,100047,800003,800047
Note how in the above list, record no. 8 is missing because the number 400076 already exists in a previous record.
The code I am using to do this is as follows:
void Make_List(ConfigList *pathgroups, ConfigList *configlist)
{
int i,j,k,l,flag,pg_num=0,len,p_num=0;
for(i = 0;i<configlist->num_total;i++)
{
flag = 0;
for(j = configlist->configsize-1;j>=0;j--)
{
if(configlist->pathid[i][j])
{
for(k = 0;k<pg_num;k++)
{
for(l = pathgroups->configsize-1;l>=0;l--)
{
if(pathgroups->pathid[k][l])
{
if(configlist->pathid[i][j]==pathgroups->pathid[k][l])
{
flag++;
break;
}
}
else
{
break;
}
}
if(flag)
{
break;
}
}
}
else
{
break;
}
if(flag)
{
break;
}
}
if(!flag)
{
len = 0;
for(j = configlist->configsize-1;j>=0;j--)
{
pathgroups->pathid[pg_num][j]=configlist->pathid[i][j];
if(configlist->pathid[i][j])
{
len++;
}
}
pg_num++;
p_num+=len;
if(p_num>=totpaths)
{
break;
}
}
}
Print_ConfigList(stderr,pathgroups);
}
The structure ConfigList basically stores the 2D array along with other things used in different parts of the program.
num_total tells us the number of rows in the array whereas configsize tells us the number of columns in the array.
totpaths is a breakpoint which terminates the loop early in case assignment is completely finished.

Checking if each element is repeated for each new element analyzed has a computational cost of O(N^2) which, given your large input set, is far too much.
Basically, what you need is a fast access data-structure where you can keep a count of how many times your record has appeared or at least a boolean flag.
The easiest way to do this is to have an array where the position represent each possible value and the array value the count of times the position value has appeared (or its boolean value of existence). However, if your data range is too large you can do this because the memory used to store the array is proportional to the range size.
The alternative to avoid that is to use Hash tables or sets.
As you has established in your comments above, your integer range is [0,99999999] so if you wanted to use a vector to keep track of the presence or not of each single value you would need approximately 96 MB to store it in memory.
This is an example using byte arrays:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAX_IN_RANGE 99999999
int main()
{
char * isInInput = (char*)malloc(MAX_IN_RANGE+1);
memset(isInInput,0,MAX_IN_RANGE+1);
size_t i;
int inputExample[] = {1,3,5,2,1,5};
for(i = 0; i < 6; i++)
{
int value = inputExample[i];
printf("%d\n",value);
if(!isInInput[value])
{
printf("Add value %d to your collection\n", value);
isInInput[value] = 1;
}
else
{
printf("%d is repeated\n", value);
}
}
free(isInInput);
}
To use hash tables instead you can rely on libraries such as Judy in order to avoid implementing your own hash table.

Related

write a program in java that accept n number of elements in an array then display only duplicate elements

write a program in java that accept n number of elements in an array then display only duplicate elements
for example if we enter 12352342678898
it will show 238
i have tryed this but this is very long
import java.util.*;
class JavaApplication1 {
public static void main(String[] args) {
Scanner sc=new Scanner(System.in);
System.out.print("Enter Size: ");
int s=sc.nextInt();
int a[]=new int[s];
int t[]=new int[s];
int p=0;
System.out.println("Enter numbers :-");
for(int i=0;i<s;i++)
{
a[i]=sc.nextInt();
}
for(int i=0;i<s;i++)
{
boolean flag=false;
for(int j=i+1;j<s;j++)
{
if(a[i]==a[j])
{
flag=true;
break;
}
}
if(flag==true)
{
t[p]=a[i];p++;
}
}
System.out.print("duplicate elements are:- ");
for(int i=0;i<p;i++)
{
boolean flag=false;
for(int j=i+1;j<s;j++)
{
if(t[i]==t[j])
{
flag=true;
break;
}
}
if(flag==false)
{
System.out.print(t[i]+" ");
}
}
}
}
I'll tell you the keypoints of the solution I'd implement.
Do not force the user to establish the size of the array beforehand: just say press "q" to exit.
The idea of using the index of the array as the numbers is not a bad one as long as you are the only programmer. If not I'd use a dictionnary with "element":"number of repetitions". You can then add letters to your program
Finally, you need to make it more modular. First create a function that takes all the numbers and return an array. Then create a function that filters that array and returns the repeated elements. Finally, create a function that writes that array. In your main you should only call those 3 functions.
You can create a Set and as you go through the numbers the first time, you can add them to the set. It will not allow you to add duplicate values because sets do not allow duplicates. This will result in you only having to loop through the numbers once and then at the end, display the numbers contained in the set. I hope that helps
One possibility is to sort the array a, e.g., using java.util.Arrays.sort(a);. Then just iterate through a and check for repeating numbers. That part could look like this:
Arrays.sort(a);
Integer prev = null;
Integer printed = null;
for (int val : a) {
if (prev != null && val == prev && (printed == null || val != printed)) {
System.out.print(val + " ");
printed = val;
}
prev = val;
}
This is quite an ugly program and solution (my code). While this works from algorithmic point of view, I do not recommend this as a good programming practice. For instance, using null checks... It is also advisable to move the detection of duplicate numbers into separate method (or better, class), without printing out. This is mixing computation with user interface, bad practice.

How to add elements from one array into another array of an undefined size based on a condition?

I have been teaching myself C for just a few weeks, and am attempting to write a code that enables the user to decide the size and elements in an array which is then separated into two arrays - one for odd numbers, and one for even numbers.
I am pretty sure that dynamic allocation has something to do with this, but I am unsure of how to implement it. Here is the code so far:
#include <stdio.h>
#include <stdlib.h>
int main()
{
//User decides the size of the array of numbers-------------------------------
int n;
printf("How many numbers? ");
scanf("%d",&n);
//User inputs values into array the size of array[n]--------------------------
int i;
int array[n];
printf("What are the numbers?\n");
for(i=0; i<n; i++)
{
scanf("%d",&array[i]);
}
//loop goes through array, separates even and odds into 2 new arrays----------
//use dynamic allocation??
for(i=0;i<n;i++)
{
int *evenarray = malloc(sizeof(evenarray)); //not sure if this is setup correctly
int *oddarray = malloc(sizeof(oddarray)); //not sure if this is setup correctly
if(array[i] % 2 == 0) //if value in array CAN be divided by 2
{
printf("Test statement.\n");
}
else //if this is not true, append to odd array
{
printf("Second test statement.\n");
}
}
}
/*this program accepts a user chosen number of numbers
then, the program separates the odd and even numbers into
two different arrays*/
There is no magical way to get this information at one shot. You can however, try either of the below:
Loop over the first array to figure out the count of odd (or even) numbers, then, you know the count of elements for which memory has to be allocated, and you can use either a VLA (as the first array itself) or use a pointer and allocator functions to allocate memory.
--> However, in this process, you have to perform the odd/even check twice - once to count the occurrence of odd/even numbers and next time, to actually decide and copy them to the new locations.
Allocate two chunks of memory similar to the first array size, and start filling the odd and even elements into the new memory, respectively. After all the elements are stored, take the counts, realloc() the allocated memories to the exact size.
--> In this case, the pre-allocation is to be done, but the odd/even check needs to be carried out only once.
You could copy into the odd/even arrays and keep seperate counters to track it. i.e:
//loop goes through array, separates even and odds into 2 new arrays----------
//use dynamic allocation??
int evencount =0;
int oddcount =0;
int *evenarray = malloc(sizeof(evenarray)); //not sure if this is setup correctly
int *oddarray = malloc(sizeof(oddarray)); //not sure if this is setup correctly
for(i=0;i<n;i++)
{
if(array[i] % 2 == 0) //if value in array CAN be divided by 2
{
printf("Printing to even array.\n");
evenarray[evencount] = array[i];
evencount++;
}
else //if this is not true, append to odd array
{
printf("Printing to odd array.\n");
oddarray[oddcount] = array[i];
oddcount++;
}
}
printf("evenarray = ");
for(i=0;i<evencount;i++){
printf("%d, ", evenarray[i]);
}
printf("\n");
printf("oddarray = ");
for(i=0;i<oddcount;i++){
printf("%d, ", oddarray[i]);
}
printf("\n");

What's the point of using linear search with sentinel?

My goal is to understand why adopting linear search with sentinel is preferred than using a standard linear search.
#include <stdio.h>
int linearSearch(int array[], int length) {
int elementToSearch;
printf("Insert the element to be searched: ");
scanf("%d", &elementToSearch);
for (int i = 0; i < length; i++) {
if (array[i] == elementToSearch) {
return i; // I found the position of the element requested
}
}
return -1; // The element to be searched is not in the array
}
int main() {
int myArray[] = {2, 4, 9, 2, 9, 10};
int myArrayLength = 6;
linearSearch(myArray, myArrayLength);
return 0;
}
Wikipedia mentions:
Another way to reduce the overhead is to eliminate all checking of the loop index. This can be done by inserting the desired item itself as a sentinel value at the far end of the list.
If I implement linear search with sentinel, I have to
array[length + 1] = elementToSearch;
Though, the loop stops checking the elements of the array once the element to be searched is found. What's the point of using linear search with sentinel?
A standard linear search would go through all the elements checking the array index every time to check when it has reached the last element. Like the way your code does.
for (int i = 0; i < length; i++) {
if (array[i] == elementToSearch) {
return i; // I found the position of the element requested
}
}
But, the idea is sentinel search is to keep the element to be searched in the end, and to skip the array index searching, this will reduce one comparison in each iteration.
while(a[i] != element)
i++;
First, lets turn your example into a solution that uses sentinels.
#include <stdio.h>
int linearSearch(int array[], int length, int elementToSearch) {
int i = 0;
array[length] = elementToSearch;
while (array[i] != elementToSearch) {
i++;
}
return i;
}
int main() {
int myArray[] = {2, 4, 9, 2, 9, 10, -1};
int myArrayLength = 6;
int mySearch = 9;
printf("result is %d\n", linearSearch(myArray, myArrayLength, mySearch));
return 0;
}
Notice that the array now has an extra slot at the end to hold the sentinel value. (If we don't do that, the behavior of writing to array[length] is undefined.)
The purpose of the sentinel approach is to reduce the number of tests performed for each loop iteration. Compare:
// Original
for (int i = 0; i < length; i++) {
if (array[i] == elementToSearch) {
return i;
}
}
return -1;
// New
while (array[i] != elementToSearch) {
i++;
}
return i;
In the first version, the code is testing both i and array[i] for each loop iteration. In the second version, i is not tested.
For a large array, the performance difference could be significant.
But what are the downsides?
The result when the value is not found is different; -1 versus length.
We have to make the array bigger to hold the sentinel value. (And if we don't get it right we risk clobbering something on the stack or heap. Ouch!)
The array cannot be read-only. We have to be able to update it.
This won't work if multiple threads are searching the same array for different elements.
Using the sentinel value allows to remove variable i and correspondingly its checking and increasing.
In your linear search the loop looks the following way
for (int i = 0; i < length; i++) {
if (array[i] == elementToSearch) {
return i; // I found the position of the element requested
}
}
So variable i is introduced, initialized, compared in each iteration of the loop, increased and used to calculate the next element in the array.
Also the function has in fact three parameters if to pass to the function the searched value
int linearSearch(int array[], int length, int value) {
//...
Using the sentinel value the function can be rewritten the following way
int * linearSearch( int array[], int value )
{
while ( *array != value ) ++array;
return array;
}
And inside the caller you can check whether the array has the value the following way
int *target = linearSearch( array, value );
int index = target == array + size - 1 ? -1 : target - array;
If you add the value to search for, you can reduce one comparison in every loop, so that the running time is reduced.
It may look like for(i = 0;;i++) if(array[i] == elementToSearch) return i;.
If you append the value to search for at the end of the array, when instead of using a for loop with initialization, condition and increment you can a simpler loop like
while (array[i++] != elementToSearch)
;
Then the loop condition is the check for the value you search for, which means less code to execute inside the loop.
The point is that you can convert the for loop into a while/repeat loop. Notice how you are checking i < length each time. If you covert it,
do {
} while (array[i++] != elementToSearch);
Then you don't have to do that extra checking. (in this case, array.length is now one bigger)
Although the sentinel approach seems to shave off a few cycles per iteration in the loop, this approach is not a good idea:
the array must be defined with an extra slot and passing its length as 1 less than the defined length is confusing and error prone;
the array must be modifiable;
if the search function modifies the array to set the sentinel value, this constitutes a side effect that can be confusing and unexpected;
the search function with a sentinel cannot be used for a portion of the array;
the sentinel approach is inherently not thread safe: seaching the same array for 2 different values in 2 different threads would not work whereas searching a constant read only array from multiple threads would be fine;
the benefits are small and only for large arrays. If this search becomes a performance bottleneck, you should probably not use linear scanning. You could sort the array and use a binary search or you could use a hash table.
optimizing compilers for modern CPUs can generate code where both comparisons will be performed in parallel, hence incur no overhead;
As a rule of thumb, a search function should not have side effects. A good example of the Principe of least surprise.

C how do I check if an array is full?

I have an array: array[3][3]
I will let the user input data into the array as long as it is not full. As soon as the array gets full I want to stop the user from inserting more data into it.
C has no array bounds check. You have to do it yourself. Use a variable to keep track of how many items you have inserted, and stop when your counter is equal to the array size.
You cannot check if "array is full". To do what u want to do, keep track of index while adding elements to array.
You need to keep track of the data inserted by a user. When your counter reaches the size of the array, the array is full. :D There is not other way in C to achieve this result as it does not provide any means of verifying how many elements are in the array.
Introduce a variable for counting the number of cells filled. You adjust this variable whenever you add/remove data to your array. Then, in order to check if your array is full, just check if this variable is equal to the total number of cells in your array.
The most simple way in my opinion is to dynamically allocate memory using calloc(), where you can initialise the array elements to, for example, zeros. The you can check if the array is full by checking, if the last element in the array is still zero or not. Of course, if the last element is still zero, then the array is not full.
First of all this is not an array but a matrix (aka array of array). you already know that the matrix has dimensions 3x3 and then you could do something like this:
int x, y;
int array[3][3];
for (x = 0; x<2; x++)
{
for (y = 0; y<2; y++)
{
//I assume that it is an array of int
printf("Insert number at %d - %d" ,x,y);
scanf("%d" ,&array[x][y]);
}
}
Now the user can only insert 3*3=9 values.
There are no way to array bound check in C, However with better coding practice we check whether array is full.
For example, consider in your array a[3][3] you don't want to have some particular value. That value could be anything! 0xFF or 0 or anything which is in the integer range! and you have to make sure that value is never given as the input to the array, and then you can verify whether your array a[3][3] is full!
/*part of coed in main*/
/*here initialize the array with 0, assuming that 0 will never be a part of array a[3][3]*/
for(i=0; i<3; i++)
{
for(j=0;j<3;j++)
{
a[i][j] = 0; // assuming that 0 will never be a part of array a[3][3]
}
}
while(CheckArray(**a)!=0)
{
printf("give array input:\n")
scanf("%d", &a[row][column]); //writing to empty cell of an array
}
//CheckArray code
int CheckArray(int a[][])
{
for(i=0; i<3; i++)
{
for(j=0;j<3;j++)
{
if(a[i][j] == 0) // assuming that 0 will never be a part of array a[3][3]
{
row = i; // row and column must be global variables in this example!
column = j; // row and column must be global variables in this example!
return 1;
}
else
{
// no need to do anything here
}
}
}
//if code reaches here then array is full!
printf("Array is full.\n");
return 0;
}
You can still optimize the above code! this is just one way of checking whether array is full with better coding practice!
If possible, you could initialize all elements in the array to a certain value that your program would otherwise consider "illegal" or "invalid". E.g. an array of positive numbers can be initialized to be all -1's. Or an array of chars can be initialized to be all NULL characters.
Then, just look at the last element if it is set to the default value.
measurement = sizeof(myarray) / sizeof(element); //or just a constant
while(myarray[measurement-1] == defaultvalue){
//insert code here...
}
Encapsulate the behavior into a struct with getter/setter functions that check for the max length of the desired vector:
typedef varvector
varvector;
struct varvector {
int length;
void* vector;
};
varvector* varvector_create(int length) {
varvector* container = malloc(sizeof(varvector));
void* vector = malloc(length);
if(container && vector) {
container->vector = vector;
}
return(container);
}
void varvector_destroy(varvector* container) {
free(container.vector);
free(container);
}
varvector_get(varvector* container, int position) {
if(position < container.length) {
return(container->vector[position]);
}
}
varvector_set(varvector* container, int position, char value) {
if(position < container.length) {
container->vector[position] = value
}
}
Object Oriented Programming is a design pattern which happens to have syntactic support in some languages and happens to not have syntactic support in C.
This does not mean that you cannot use this design pattern in your work, it just means you have to either use a library that already provides this for you (glib, ooc) or if you only need a small subset of these features, write your own basic functions.
You can assume that the last element of an array has id=0.
Then in function add check if there is an element with id=0.
int add(char *source, char *target, int size) {
int index = 0;
for (int i = 0; i < size; i++) {
if (target[i].id == 0) {
index = i;
break;
}
}
if (index >= 0 && index < size) {
if (index < size - 1) target[index + 1].id = 0;
// check that element with id=0 is the last in the array
//write code to add your element here
return index;
}
}

BubbleDown operation on binary min-heap does not work

I'm trying to extract the minimum from a binary heap but it's not working. Here's my BubbleDown code:
void heapBubbleDown(Heap * const heap, int idx) {
int min;
while(RIGHT(idx) < heap->count) {
min = LEFT(idx);
if(RIGHT(idx) < heap->count) {
if(heap->items[LEFT(idx)] > heap->items[RIGHT(idx)]) {
min = RIGHT(idx);
}
}
heapSwapValue(&(heap->items[idx]), &(heap->items[min]));
idx = min;
}
}
It looks like it only swaps a few numbers but not all of them, I can't understand why. I tried to recode it differently and many times already...
What am I doing wrong?
I don't think the problem is that it swaps to few elements. You have to stop when the smallest child >= current item.
I would rewrite the last two lines to:
if (heap->items[idx] > heap->items[min]) {
heapSwapValue(&(heap->items[idx]), &(heap->items[min]));
idx = min;
}
else
break;
}
The condition of the while is not sufficient. It may be the case that there is no right child but you need to do the swap with the left child. Moreover as Henk suggests, you need to check if the calculated minimum is in fact smaller than your current value.

Resources