Avoiding duplicates in a 2D array? - c

I am doing a program in C which needs to take in a set of values (integers) into a 2D array, and then performs certain mathematical operations on it. I have decided to implement a check in the program as the user is inputting the values to avoid them from entering values that are already present in the array.
I am however unsure of how to go about this check. I figured out I might need some sort of recursive function to check all the elements previous to the one that's being entered, but I don't know how to implement it.
Please find below a snippet of my code for illustrative purposes:
Row and col are values inputted by the user for the dimension of the array
for (int i=0; i<row;i++){
for (int j=0; j<col; j++){
scanf("%d", &arr[i][j]); //take in elements
}
}
for (int i = 0; i < row; i++)
{
for (int j = 0; i < col; j++)
{
if (arr[i][j] == arr[i][j-1]){
printf("Duplicate.\n");}
else {}
}
}
I know this is probably not correct but it's my attempt.
Any help would be much appreciated.

I would suggest that your store every element you read in a temporary 1D array. Everytime you scan a new element, traverse the 1D array checking if the value exists or not. Although this is not optimal, this will be at least less expensive than traversing the 2D array everytime.
Example:
int temp[SIZE];
int k,elements = 0;
for (int i = 0; i < row; i++) {
for (int j = 0; j < col; j++) {
scanf("%d", &arr[i][j]); //take in elements
temp[elements] = arr[i][j];
elements++;
for (int k = 0; k < elements; k++) {
if (temp[k] == arr[i][j])
printf("Duplicate.\n"); //or do whatever you wish
}
}
}

A balanced tree inserts and searches in O(log N) time.
Since the algorithms are quite simple & standard and were published in the seminal books by Knuth, there are plenty of implementations out there, including a clear and concise one at codereview.SE (which is thus automatically CC-BY-SA 3.0; do apply a bugfix in the answer). Using it (as well as virtually any other one) is simple: start with node* root = NULL;, then insert and search, and finally free_tree.
Asymptotically, the best method is a hash table with O(1) for both, but that is probably an overkill (the algorithms are much more complex and memory footprint is larger) unless you have a lot of numbers. For C++, there's a standard implementation, yet there are plenty 3rd-party ones for C, too.
If your number of input values is small, even the tree may be an overkill, and simply looking through previous values would be fast enough. If your 2D array is contiguous in memory, you can access it as 1D with int* arr1d = (int*)&arr2d.

Related

Is there an approach to traverse array randomly?

I am trying to compare linear memory access to random memory access. I am traversing an array in the order of its indices to log performance of linear memory access. However to log memory's performance with random memory access I want to traverse my array randomly i.e arr[8], arr[17], arr[34], arr[2]...
Can I use pointer chasing to achieve this while ensuring that no index are accessed twice? Is pointer chasing most optimal approach in this case?
If your goal is to show that sequential access is faster than non-sequential access, simply pointer chasing the latter is not a good way to demonstrate that. You would be comparing access via a single pointer plus simple offset against deterrencing one or more pointers before offsetting.
To use pointer chasing, you'd have to apply it to both cases. Here's an example:
int arr[n], i;
int *unshuffled[n];
int *shuffled[n];
for(i = 0; i < n; i++) {
unshuffled[i] = arr + i;
}
/* I'll let you figure out how to randomize your indices */
shuffle(unshuffled, shuffled)
/* Do toning on these two loops */
for(i = 0; i < n; i++) {
do_stuff(*unshuffled[i]);
}
for(i = 0; i < n; i++) {
do_stuff(*shuffled[i]);
}
It you want to time the direct access better though, you could construct some simple formula for advancing the index instead of randomizing the access completely:
for(i = 0; i < n; i++) {
do_stuff(arr[i]);
}
for(i = 0; i < n; i++) {
do_stuff(arr[i / 2 + (i % 2) * (n / 2)]);
}
This will only work properly for even n as shown, but it illustrates the idea. You could go so far as to compensate for the extra flops in computing the index within do_stuff.
Probably the most apples-to-apples test would be to literally access the indices you want, without loops or additional computations:
do_stuff(arr[0]);
do_stuff(arr[1]);
do_stuff(arr[2]);
...
do_stuff(arr[123]);
do_stuff(arr[17]);
do_stuff(arr[566]);
...
Since I'd imagine you'd want to test with large arrays, you can write a program to generate the actual test code for you, and possibly compile and run the result.
I can tell you that for arrays in C the access time is constant regardless of the index being accessed. There will be no difference between accessing them randomly or sequentially other than the fact that randomizing will in itself introduce additional computations.
But, to really answer your question, you would probably be best off to build some kind of lookup array and shuffle it a few times and use that array to get the next index. Obviously, you would be accessing two arrays, one sequentially and another randomly, by doing so, thus making the exercise pretty much useless.

Optimising multiple stacked for loops

I am currently refactoring an old program and I am having trouble with finding a way to optimize a certain piece of code.
The primary aim is memory usage reduction over performance as the system is embedded.
for(n = 0; n < NUMBER_VARS_IN_STRUCT1; n++) {
int m;
for(m = 0; m < NUMBER_OF_LANGUAGES; m++) {
UnicodeStrCat(2, tx_data_1, struct1.var1[n][m], L"\r\n");
SendSerialUserData(UNICODE);
}
for(m = 0; m < 4; m++) {
sprintf(ansicode_text, "%.8f\r\n", (double) struct1.var2[n][m]);
StrAnsiToUnicode(tx_data_1, ansicode_text);
SendSerialUserData(UNICODE);
}
for(m = 0; m < 4; m++) {
sprintf(ansicode_text, "%.8f\r\n", (double) struct1.var3[n][m]);
StrAnsiToUnicode(tx_data_1, ansicode_text);
SendSerialUserData(UNICODE);
}
The code is much longer (~250 lines of the same sort of thing) and is then repeated in a similar manner for allowing data to be read back in to the device.
I had thought that a solution to reduce memory usage would be to potentially hardcode an array to hold pointers to each of the array values in each structure (Or if I know the size of each array then I could increment the pointer location), and then largely reduce the code size by simply cycling through this.
The output of the function is to print out a large table of data through a serial bus.
Thank you in advance for any help

Create actual random data in arrays

I have been working on an assignment, where I have to create a given number of arrays and fill them up with random data. The approach I would like to follow is I want the arrays to be filled with data, only a percentage. The problem is that for every array, the random values are in the same position and not spread how I would like.
I have been creating the arrays in this way:
int **array = malloc(DOC * sizeof *array);
for (i = 0; i < DOC; i++)
{
array[i] = malloc(MAXWORDS * sizeof **array);
}
and filling them using :
srand((unsigned) time(&t));
and
for(i = 0; i < DOC; i++){
for(j = 0; j < MAXWORDS; j++){
array[i][rand() %percentage]=rand() %VALUE;
}
}
Where
int percentage = rand() %MAXWORDS/10;
MAXWORDS defines the lenght of the array
DOC the number of arrays
VALUE is the max random value
As you can see the random values are all behaving identically.
I know that this has to do with the way that srand depends on the time to generate the numbers, and the execution of the program is really fast, so the similar data are because of the "similar" time. So what I am asking is for either a different day to generate random values or some trick I could do to fill the arrays differently.
With "rand() % percentage" you are only picking elements within the first 10% of each array. Instead, you probably want something like this:
for (i = 0; i < DOC; ++i){
for (j = 0; j < MAXWORDS; ++j) {
if (rand() % 100 <= 10) {
array[i][j] = rand() % VALUE;
}
}
}
This gives each elements in the array roughly a 10% chance of being initialized, which should result (for large enough arrays) in about 10% of the elements being initialized.
If you want exactly 10% of the array to be initialized, you could instead do something like placing all indices (0...j) into an array, randomizing the array, and picking the first MAXWORDS/10 indices from the randomized array for initialization.
rand() and srand(), especially when used with %, they don't produce random numbers as uniformly distributed as you may think.
Check Mersenne twister algorithm as an alternative pseudorandom number generator.
I think the problem is with how you are choosing the second index: rand() % percentage will always fill towards the front of the arrays.
The standard random number generator in C (srand + rand) tends to be pretty bad at generating numbers that pass statistical tests for randomness. There are more sophisticated random number generators with better properties available as part of the GNU Scientific Library that you may find helpful.

Grid containing apples

I found this question on a programming forum:
A table composed of N*M cells,each having a certain quantity of apples, is given. you start from the upper-left corner. At each step you can go down or right one cell.Design an algorithm to find the maximum number of apples you can collect ,if you are moving from upper-left corner to bottom-right corner.
I have thought of three different complexities[in terms of time & space]:
Approach 1[quickest]:
for(j=1,i=0;j<column;j++)
apple[i][j]=apple[i][j-1]+apple[i][j];
for(i=1,j=0;i<row;i++)
apple[i][j]=apple[i-1][j]+apple[i][j];
for(i=1;i<row;i++)
{
for(j=1;j<column;j++)
{
if(apple[i][j-1]>=apple[i-1][j])
apple[i][j]=apple[i][j]+apple[i][j-1];
else
apple[i][j]=apple[i][j]+apple[i-1][j];
}
}
printf("\n maximum apple u can pick=%d",apple[row-1][column-1]);
Approach 2:
result is the temporary array having all slots initially 0.
int getMax(int i, int j)
{
if( (i<ROW) && (j<COL) )
{
if( result[i][j] != 0 )
return result[i][j];
else
{
int right = getMax(i, j+1);
int down = getMax(i+1, j);
result[i][j] = ( (right>down) ? right : down )+apples[i][j];
return result[i][j];
}
}
else
return 0;
}
Approach 3[least space used]:
It doesn't use any temporary array.
int getMax(int i, int j)
{
if( (i<M) && (j<N) )
{
int right = getMax(i, j+1);
int down = getMax(i+1, j);
return apples[i][j]+(right>down?right:down);
}
else
return 0;
}
I want to know which is the best way to solve this problem?
There's little difference between approaches 1 and 2, approach 1 is probably a wee bit better since it doesn't need the stack for the recursion that approach 2 uses since that goes backwards.
Approach 3 has exponential time complexity, thus it is much worse than the other two which have complexitx O(rows*columns).
You can make a variant of approach 1 that proceeds along a diagonal to use only O(max{rows,columns}) additional space.
in term of time the solution 1 is the best because there is no recursie function.
the call of recursive function takes time
Improvement to First Approach
Do you really need the temporary array to be N by M?
No.
If the initial 2-d array has N columns, and M rows, we can solve this with a 1-d array of length M.
Method
In your first approach you save all of the subtotals as you go, but you really only need to know the apple-value of the cell to the left and above when you move to the next column. Once you have determined that, you don't look at those previous cells ever again.
The solution then is to write-over the old values when you start on the next column over.
The code will look like the following (I'm not actually a C programmer, so bear with me):
The Code
int getMax()
{
//apple[][] is the original apple array
//N is # of columns of apple[][]
//M is # of rows of apple[][]
//temp[] is initialized to zeroes, and has length M
for (int currentCol = 0; currentCol < N; currentCol++)
{
temp[0] += apple[currentCol][0]; //Nothing above top row
for (int i = 1; i < M; i++)
{
int applesToLeft = temp[i];
int applesAbove = temp[i-1];
if (applesToLeft > applesAbove)
{
temp[i] = applesToLeft + apple[currentCol][i];
}
else
{
temp[i] = applesAbove + apple[currentCol][i];
}
}
}
return temp[M - 1];
}
Note: there isn't any reason to actually store the values of applesToLeft and applesAbove into local variables, and feel free to use the ? : syntax for the assignment.
Also, if there are less columns than rows, you should rotate this so the 1-d array is the shorter length.
Doing it this way is a direct improvement over your first approach, as it saves memory, and plus iterating over the same 1-d array really helps with caching.
I can only think of one reason to use a different approach:
Multi-Threading
To gain the benefits of multi-threading for this problem, your 2nd approach is just about right.
In your second approach you use a memo to store the intermediate results.
If you make your memo thread-safe (by locking or using a lock-free hash-set) , then you can start multiple threads all trying to get the answer for the bottom-right corner.
[// Edit: actually since assigning ints into an array is an atomic operation, I don't think you would need to lock at all ].
Make each call to getMax choose randomly whether to do the left getMax or above getMax first.
This means that each thread works on a different part of the problem and since there is the memo, it won't repeat work a different thread has already done.

alternative to multidimensional array in c

tI have the following code:
#define FIRST_COUNT 100
#define X_COUNT 250
#define Y_COUNT 310
#define z_COUNT 40
struct s_tsp {
short abc[FIRST_COUNT][X_COUNT][Y_COUNT][Z_COUNT];
};
struct s_tsp xyz;
I need to run through the data like this:
for (int i = 0; i < FIRST_COUNT; ++i)
for (int j = 0; j < X_COUNT; ++j)
for (int k = 0; k < Y_COUNT; ++k)
for (int n = 0; n < Z_COUNT; ++n)
doSomething(xyz, i, j, k, n);
I've tried to think of a more elegant, less brain-dead approach. ( I know that this sort of multidimensional array is inefficient in terms of cpu usage, but that is irrelevant in this case.) Is there a better approach to the way I've structured things here?
If you need a 4D array, then that's what you need. It's possible to 'flatten' it into a single dimensional malloc()ed 'array', however that is not quite as clean:
abc = malloc(sizeof(short)*FIRST_COUNT*X_COUNT*Y_COUNT*Z_COUNT);
Accesses are also more difficult:
*(abc + FIRST_COUNT*X_COUNT*Y_COUNT*i + FIRST_COUNT*X_COUNT*j + FIRST_COUNT*k + n)
So that's obviously a bit of a pain.
But you do have the advantage that if you need to simply iterate over every single element, you can do:
for (int i = 0; i < FIRST_COUNT*X_COUNT*Y_COUNT*Z_COUNT; i++) {
doWhateverWith *(abc+i);
}
Clearly this method is terribly ugly for most uses, and is a bit neater for one type of access. It's also a bit more memory-conservative and only requires one pointer-dereference rather than 4.
NOTE: The intention of the examples used in this post are just to explain the concepts. So the examples may be incomplete, may lack error handling, etc.
When it comes to usage of multi-dimension array in C, the following are the two possible ways.
Flattening of Arrays
In C, arrays are implemented as a contiguous memory block. This information can be used to manipulate the values stored in the array and allows rapid access to a particular array location.
For example,
int arr[10][10];
int *ptr = (int *)arr ;
ptr[11] = 10;
// this is equivalent to arr[1][0] = 10; assign a 2D array
// and manipulate now as a single dimensional array.
The technique of exploiting the contiguous nature of arrays is known as flattening of arrays.
Ragged Arrays
Now, consider the following example.
char **list;
list[0] = "United States of America";
list[1] = "India";
list[2] = "United Kingdom";
for(int i=0; i< 3 ;i++)
printf(" %d ",strlen(list[i]));
// prints 24 5 14
This type of implementation is known as ragged array, and is useful in places where the strings of variable size are used. Popular method is to have dynamic-memory-allocation to be done on the every dimension.
NOTE: The command line argument (char *argv[]) is passed only as ragged array.
Comparing flattened and ragged arrays
Now, lets consider the following code snippet which compares the flattened and ragged arrays.
/* Note: lacks error handling */
int flattened[30][20][10];
int ***ragged;
int i,j,numElements=0,numPointers=1;
ragged = (int ***) malloc(sizeof(int **) * 30);
numPointers += 30;
for( i=0; i<30; i++) {
ragged[i] = (int **)malloc(sizeof(int*) * 20);
numPointers += 20;
for(j=0; j<20; j++) {
ragged[i][j]=(int*)malloc(sizeof(int) * 10);
numElements += 10;
}
}
printf("Number of elements = %d",numElements);
printf("Number of pointers = %d",numPointers);
// it prints
// Number of elements = 6000
// Number of pointers = 631
From the above example, the ragged arrays require 631-pointers, in other words, 631 * sizeof(int *) extra memory locations for pointing 6000 integers. Whereas, the flattened array requires only one base pointer: i.e. the name of the array enough to point to the contiguous 6000 memory locations.
But OTOH, the ragged arrays are flexible. In cases where the exact number of memory locations required is not known you cannot have the luxury of allocating the memory for worst possible case. Again, in some cases the exact number of memory space required is known only at run-time. In such situations ragged arrays become handy.
Row-major and column-major of Arrays
C follows row-major ordering for multi-dimensional arrays. Flattening of arrays can be viewed as an effect due this aspect in C. The significance of row-major order of C is it fits to the natural way in which most of the accessing is made in the programming. For example, lets look at an example for traversing a N * M 2D matrix,
for(i=0; i<N; i++) {
for(j=0; j<M; j++)
printf(“%d ”, matrix[i][j]);
printf("\n");
}
Each row in the matrix is accessed one by one, by varying the column rapidly. The C array is arranged in memory in this natural way. On contrary, consider the following example,
for(i=0; i<M; i++) {
for(j=0; j<N; j++)
printf(“%d ”, matrix[j][i]);
printf("\n");
}
This changes the column index most frequently than the row index. And because of this there is a lot of difference in efficiency between these two code snippet. Yes, the first one is more efficient than the second one!
Because the first one accesses the array in the natural order (row-major order) of C, hence it is faster, whereas the second one takes more time to jump. The difference in performance would get widen as the number of dimensions and the size of element increases.
So when working with multi-dimension arrays in C, its good to consider the above details!

Resources