How many elements are full in a C array - c

If you have an array in C, how can you find out how much of it is filled?

In a C array, any element is an object. It's not like in Java where you have references that first have to be assigned to point to objects. Anything in C behaves like a primitive type in Java.
If you have an array of pointers in C, you may view this similar to how things in Java work. You can use null pointers to designate "is not filled to point to an object":
// creates an array of 10 pointers, and initializes all of
// them to null pointers. If you leave off "{ 0 }", you
// have to manually initialize them!
struct foo *array[10] = { 0 };
Then you can simply test with
if(array[i] == 0) {
printf("Position %d does not point to an object!\n", i);
}

You need to keep track of this yourself. There is no concept of "full" (or anything in between for that matter): you have to define this.
Of course if the elements are contiguous in the array, you could use a NULL element to signify the "end" of the array thus defining a "full" state at the same time.

It's all filled, so the answer is whatever the size of your array is. An array is a contiguous memory segment, so it is filled by default with whatever was at that memory location before.
But you probably want to know how much of it is filled with data that you care about, and not with random data. In that case, there is no way of knowing that unless you keep track of it yourself.

I agree with other answers, but I can suggest you a way to make your work easier. You can manage the array like an object and control the adding and the removing of data. If you implement two functions, one to add elements and one to remove them, with the proper logic to manage fragmentation and multi-threading, you can track the number of elements into the array reading a counter, which is written only by add and remove function. So you don't have to execute a loop every time you need to count the elements.

From the C language perspective, there is no concept of "filled". Once an array is defined, memory is allocated to it. For arrays like array1 (see example below), elements get initialized to 0. However, for arrays like array2, the elements can have random value.
So, the notion of "filled" has to be supplied by the program. One possible to "in-band" way is to:
(a) Choose one specific value of the element type (e.g. 0xFFFFFFFF) and use it to detect fill/empty property of each array element (However, realize that this approach takes away one otherwise valid value from the element set.), and
(b) "initialize" all the elements of the array to that disallowed value at suitable position in the program scope.
(c) To find array fill level, count the number of valid elements.
$ cat t2.c
#include <stdio.h>
#define N 10
typedef unsigned long int T;
static const T EmptyElementValue = 0xFFFFFFFF;
// Choose any suitable value above. However, the chosen value
// would not be counted as an "empty" element in the array.
static T array1[ N ];
void
printArray( T a[], size_t length )
{
size_t i;
for( i = 0; i < length; ++i )
{
printf( "%lu, ", a[ i ] );
}
printf( "\n" );
}
size_t
numFilledElements( T a[], size_t length )
{
size_t fillCount = 0;
size_t i;
for( i = 0; i < length; ++i )
{
if( a[ i ] != EmptyElementValue )
{
fillCount += 1;
}
}
return fillCount;
}
int main()
{
T array2[ N ];
size_t i;
printArray( array1, N );
printArray( array2, N );
//------------------------------------------//
// Make array2 empty
for( i = 0; i < N; ++i )
{
array2[ i ] = EmptyElementValue;
}
// Use some elements in array2
array2[ 2 ] = 20;
array2[ 3 ] = 30;
array2[ 7 ] = 70;
array2[ 8 ] = 80;
printf( "Number of elements \"filled\" in array2 = %u\n",
numFilledElements( array2, N ));
// Stop using some elements in array2
array2[ 3 ] = EmptyElementValue;
printf( "Number of elements \"filled\" in array2 = %u\n",
numFilledElements( array2, N ) );
return 0;
}
$ gcc -Wall t2.c -o t2
$ ./t2
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 60225, 2280452, 1627469039, 1628881817, 2281060, 2280680, 1628304199, 1628881818, 47,
Number of elements "filled" in array2 = 4
Number of elements "filled" in array2 = 3
$

Subtract the number of empty elements from the size of the array. ;-)
Sorry, there is no way (except you keeping track), to tell whether an array element has been modified.

in C, there is no built-in way of knowing how many elements are filled with data that you care about. You will need to build it yourself. As was previously said, if you can have a value that will not represent anything(0 for example), you could :
Count the elements that do not have this undefined value.
If your filled elements will be in the same block of memory, you can look for the undefined value(Sentinel)
On the other hand, if you need the extent of your data to be represented, You will need a flag array that will keep track of the elements that are set and those that aren't :
For example, if you have an array of 32 elements or less, you only need an unsigned integer to keep track of your array:
1100010 ...
Values:
1 -> Set
2 -> Set
3 -> no set
4 -> not set
5 -> not set
6 -> set
etc.
So Whenever you are filling an element you call the function that sets the correct bit and when you are "unfilling" the data you unset the bit that corresponds to it.
Once this is done, All you would need to do is simply call a popcount over the flag array.

You could do a while(yourArray != NULL)loop and through the loop just increment an integer value and that should tell you.

Related

How to count how many elements have been added to an array in C

So basically I have an array with a size of 5 and I want to count how many elements it has inside it.
int main()
{
int size;
char ola[5];
ola[0] = 'p';
size = sizeof(ola);
printf("%d\n", size);
return 0;
}
This code returns 5, but I expected 1.
What should I change?
The C standard does not provide any usable concept of elements in an array being used or unused. An array is simply a sequence of elements. It is up to you to track how many elements in it are of interest to you; you must write your own code for this.
You can do this by keeping a separate counter, by using a particular element value to denote that an element is “unused,” by using a particular element value to denote that an element ends the elements of interest, by calculating the end from related information, or by other means.
The C library includes many routines that use the third method, where the “null character,” with value zero, is used to mark the end of a string of characters. When you use this method, be sure to include room for the terminating null character in any array. For example, if you want an array to hold strings of length up to n characters, define the array to have at least n+1 elements.
Keep a count as you add elements to the array, rather than using sizeof (which returns the total capacity of the array in bytes), for example
#include <stdio.h>
int main(void)
{
int size = 0; // initialize size
char ola[5] = ""; // also initialize ola (with 5 '\0's)
ola[size++] = 'p'; // use and increment size
// size = sizeof(ola); // size is already updated
printf("%d\n", size);
// one more character
ola[size++] = 'o';
printf("%d\n", size);
printf("The string is [%s]\n", ola);
return 0;
}
It returns 5 because it is the size of your array, you could try:
size = sizeof(ola[0]);
To get the response 1, this mean that you are getteing exatly the size of the element, and not the size of whole array.

Fastest data structure for inserting billions of integers?

I want recommendation on which is the fastest data structure in C which can hold about 2 billion integers taken from input. The integer value would not be less than 0 and would not be greater than 2 billion. My goal is to remove any duplicate values and sort elements of the data structure. If possible, I want to able to do the inserting operation in O(1) or O(logn) or as quickly as possible. I also want to avoid trees if possible. I would appreciate any feedback or recommendation about this.
Edit: Using a normal array would take a really long time. So, I want to use some other data structure than the array such as stack, queue, etc.
Since you have a given number of values, and the range of those values is the same as the number of values, you can implement the list as an array where each array index represents a value and the value of each array element represents whether or not a given value is in the list.
For example:
char *arr = malloc(20000000001);
int i;
// populate list
memset(arr, 0, sizeof(arr));
for (i=0; i<20000000001; i++) {
int value;
scanf("%d", &value);
arr[value] = 1;
}
// print list
for (i=0; i<20000000001; i++) {
if (arr[i]) {
printf("%d\n", i);
}
}
Here we initialize the list to contain 0 for all values. Then we read in the values. If we read the value n, then we set arr[n] to 1. This does two things: it inserts the value in the list and eliminates duplicates by always setting the value to 1 as opposed to incrementing the value.
This gives O(1) insersions with duplicate removal, and the list is already sorted.
Note also that since each element of the array only needs to store the values 0 or 1 we cause use char as the type to save memory. We can further save memory if we use each bit to hold the value 0 or 1 for a given value. Doing this will involve some bit shifting:
unsigned char *arr = malloc(20000000001 / 8 + 1);
int i;
// populate list
memset(arr, 0, sizeof(arr));
for (i=0; i<20000000001; i++) {
int value;
scanf("%d", &value);
arr[value/8] |= 1 << (value%8);
}
// print list
for (i=0; i<20000000001; i++) {
if (arr[i/8] & (1 << (i%8))) {
printf("%d\n", i);
}
}
This cuts the memory requirements down to about 250MB which is still large but manageable.

How to insert an element starting the iteration from the beginning of the array in c?

I have seen insertion of element in array starting iteration from the rear end. But i wonder if it is possible to insert from the front
I finally figured out a way, Here goes the code
#include <stdio.h>
int main()
{
int number = 5; //element to be inserted
int array[10] = {1, 2, 3, 4, 6, 7, 8, 9};
int ele, temp;
int pos = 4; // position to insert
printf("Array before insertion:\n");
for (int i = 0; i < 10; i++)
{
printf("%d ", array[i]);
}
puts("");
for (int i = pos; i < 10; i++)
{
if (i == pos) // first element
{
ele = array[i + 1];
array[i + 1] = array[i];
}
else // rest of the elements
{
temp = array[i + 1];
array[i + 1] = ele;
ele = temp;
}
}
array[pos] = number; // element to be inserted
printf("Array after insertion:\n");
for (int i = 0; i < 10; i++)
{
printf("%d ", array[i]);
}
return 0;
}
The output looks like:
Array before insertion:
1 2 3 4 6 7 8 9 0 0
Array after insertion:
1 2 3 4 5 6 7 8 9 0
In C the arrays have a "native" built-in implementation based upon the address (aka pointer) to the first element and a the [] operator for element addressing.
Once an array has been allocated, its actual size is not automatically handled or checked: the code needs to make sure boundaries are not trespassed.
Moreover, in C there is no default (aka empty) value for any variable, there included arrays and array element.
Still, in C there's no such a thing like insertion, appending or removal of an array element. You can simply refer to the n-th (with n starting at 0) array element by using the [] operator.
So, if you have an array, you cannot insert a new item at its n-th position. You can only read or (over)write any of its items.
Any other operation, like inserting or removing, requires ad-hoc code which basically boils down to shifting the arrays elements forward (for making room for insertion) or backward (for removing one).
This is the C-language nature and should not be seen as a limitation: any other language allowing for those array operations must have a lower-level hidden implementation or a non-trivial data structure to implement the arrays.
This means, in C, that while keeping the memory usage to a bare minimum, those array operations require some time-consuming implementation, like the item-shifting one.
You can then trade-off the memory usage against the time usage and get some gains in overall efficiency by using, for example, single- and double-linked lists. You loose some memory for link pointer(s) in favor of faster insertion ad removal operations. This depends mostly upon the implementation goals.
Finally, to get to the original question, an actual answer requires some extra details about the memory vs time trade off that can be done to achieve the goal.
The solution depicted by #Krishna Acharya is a simple shift-based solution with no boundary check. A very simple and somehow naive implementation.
A final note. The 0s shown by Krishna's code at the end of the arrays should be considered merely random values. As I said earlier, there is no default value.
The code should have been instead:
int array[10] = {1, 2, 3, 4, 6, 7, 8, 9, 0, 0};
in order to make sure that any unused value was 0 for the last two array elements.

Most Frequent of every N Elements in C

I have a large array A of size [0, 8388608] of "relatively small" integers A[i] = [0, 131072] and I want to find the most frequently occurring element of every N=32 elements.
What would be faster,
A. Create an associative array B of size 131072, iterate through 32 elements, increment B[A[i]], then iterate through B, find the largest value, reset all elements in B to 0, repeat |A|/32 times.
B. qsort every 32 elements, find the largest range where A[i] == A[i-1] (and thus the most frequent element), repeat |A|/32 times.
(EDIT) C. Something else.
An improvement over the first approach is possible. There is no need to iterate through B. And it can be an array of size 131072
Every time you increment B[A[i]], look at the new value in that cell. Then, have a global highest_frequency_found_far. This start at zero, but after every increment the new value should be compared with this global. If it's higher, then the global is replaced.
You could also have a global value_that_was_associated_with_the_highest_count
for each block of 32 members of A ... {
size_t B [131072] = {0,0,...};
size_t highest_frequency_found_so_far = 0;
int value_associated_with_that = 0;
for(a : A) { // where A just means the current 32-element sub-block
const int new_frequency = ++B[a];
if (new_frequency > highest_frequency_found_so_far) {
highest_frequency_found_so_far = new_frequency;
value_associated_with_that = a;
}
}
// now, 'value_associated_with_that' is the most frequent element
// Thanks to #AkiSuihkonen for pointing out a really simple way to reset B each time.
// B is big, instead of zeroing each element explicitly, just do this loop to undo
// the ++B[a] from earlier:
for(a : A) { --B[a]; }
}
what about a btree?
You only need a max of 32 nodes and can declare them up front.

Initializing entire 2D array with one value

With the following declaration
int array[ROW][COLUMN]={0};
I get the array with all zeroes but with the following one
int array[ROW][COLUMN]={1};
I don’t get the array with all one value. The default value is still 0.
Why this behavior and how can I initialize with all 1?
EDIT: I have just understood that using memset with value as 1, will set each byte as 1 and hence the actual value of each array cell wont be 1 but 16843009. How do I set it to 1?
You get this behavior, because int array [ROW][COLUMN] = {1}; does not mean "set all items to one". Let me try to explain how this works step by step.
The explicit, overly clear way of initializing your array would be like this:
#define ROW 2
#define COLUMN 2
int array [ROW][COLUMN] =
{
{0, 0},
{0, 0}
};
However, C allows you to leave out some of the items in an array (or struct/union). You could for example write:
int array [ROW][COLUMN] =
{
{1, 2}
};
This means, initialize the first elements to 1 and 2, and the rest of the elements "as if they had static storage duration". There is a rule in C saying that all objects of static storage duration, that are not explicitly initialized by the programmer, must be set to zero.
So in the above example, the first row gets set to 1,2 and the next to 0,0 since we didn't give them any explicit values.
Next, there is a rule in C allowing lax brace style. The first example could as well be written as
int array [ROW][COLUMN] = {0, 0, 0, 0};
although of course this is poor style, it is harder to read and understand. But this rule is convenient, because it allows us to write
int array [ROW][COLUMN] = {0};
which means: "initialize the very first column in the first row to 0, and all other items as if they had static storage duration, ie set them to zero."
therefore, if you attempt
int array [ROW][COLUMN] = {1};
it means "initialize the very first column in the first row to 1 and set all other items to zero".
As for how to initialize the whole array to a specific value/values, see https://stackoverflow.com/a/13488596/584518.
If you want to initialize the array to -1 then you can use the following,
memset(array, -1, sizeof(array[0][0]) * row * count)
But this will work 0 and -1 only
int array[ROW][COLUMN]={1};
This initialises only the first element to 1. Everything else gets a 0.
In the first instance, you're doing the same - initialising the first element to 0, and the rest defaults to 0.
The reason is straightforward: for an array, the compiler will initialise every value you don't specify with 0.
With a char array you could use memset to set every byte, but this will not generally work with an int array (though it's fine for 0).
A general for loop will do this quickly:
for (int i = 0; i < ROW; i++)
for (int j = 0; j < COLUMN; j++)
array[i][j] = 1;
Or possibly quicker (depending on the compiler)
for (int i = 0; i < ROW*COLUMN; i++)
*((int*)a + i) = 1;
To initialize 2d array with zero use the below method:
int arr[n][m] = {};
NOTE : The above method will only work to initialize with 0;
Note that GCC has an extension to the designated initializer notation which is very useful for the context. It is also allowed by clang without comment (in part because it tries to be compatible with GCC).
The extension notation allows you to use ... to designate a range of elements to be initialized with the following value. For example:
#include <stdio.h>
enum { ROW = 5, COLUMN = 10 };
int array[ROW][COLUMN] = { [0 ... ROW-1] = { [0 ... COLUMN-1] = 1 } };
int main(void)
{
for (int i = 0; i < ROW; i++)
{
for (int j = 0; j < COLUMN; j++)
printf("%2d", array[i][j]);
putchar('\n');
}
return 0;
}
The output is, unsurprisingly:
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
Note that Fortran 66 (Fortran IV) had repeat counts for initializers for arrays; it's always struck me as odd that C didn't get them when designated initializers were added to the language. And Pascal uses the 0..9 notation to designate the range from 0 to 9 inclusive, but C doesn't use .. as a token, so it is not surprising that was not used.
Note that the spaces around the ... notation are essentially mandatory; if they're attached to numbers, then the number is interpreted as a floating point number. For example, 0...9 would be tokenized as 0., ., .9, and floating point numbers aren't allowed as array subscripts.
With the named constants, ...ROW-1 would not cause trouble, but it is better to get into the safe habits.
Addenda:
I note in passing that GCC 7.3.0 rejects:
int array[ROW][COLUMN] = { [0 ... ROW-1] = { [0 ... COLUMN-1] = { 1 } } };
where there's an extra set of braces around the scalar initializer 1 (error: braces around scalar initializer [-Werror]). I'm not sure that's correct given that you can normally specify braces around a scalar in int a = { 1 };, which is explicitly allowed by the standard. I'm not certain it's incorrect, either.
I also wonder if a better notation would be [0]...[9] — that is unambiguous, cannot be confused with any other valid syntax, and avoids confusion with floating point numbers.
int array[ROW][COLUMN] = { [0]...[4] = { [0]...[9] = 1 } };
Maybe the standards committee would consider that?
Use vector array instead:
vector<vector<int>> array(ROW, vector<int>(COLUMN, 1));
char grid[row][col];
memset(grid, ' ', sizeof(grid));
That's for initializing char array elements to space characters.

Resources