How to save memory in an array of which many elements are always 0? - c

I have a 2tensor in C that looks like:
int n =4;
int l =5;
int p =6;
int q=2;
I then initialize each element of T
//loop over each of the above indices
T[n][l][p][q]=...
However, many of them are zero and there are symmetries such as.
T[4][3][2][1]=-T[3][4][2][1]
How can I save memory on the elements of T which are zero? Ideally I would like to place something like NULL in those positions so they use 0 instead of 8 bytes. Also, later on in the calculation I can check if they are zero or not by checking if they are equal to NULL
How do I implicitly include those symmetries in T with using excess memory?
Edit: the symmetry can perhaps be fixed with a different implementation. But what about the zeros? Is there any implementation to not have them waste memory?

You cannot influence the size of any variable by a value you write to it.
If you want to save memory you have not only to not use it, you have to not define a variable using it.
If you do not define a variable, then you have to not use it ever.
Then you have saved memory.
This is of course obvious.
Now, how to apply that to your problem.
Allow me to simplify, for one because you did not give enough information and explanation, at least not for me to understand every detail. For another, to keep the explanation simple.
So I hope that it suffices if I solve the following problem for you, which I think is kind of the little brother of your problem.
I have a large array in C (not really large, lets say N entries, with N==20).
But for special reasons, I will never need to actually read and write any even indices, they should act as if they contain 0, but I want to save the memory used by them.
So actually I want to only use M of the entries, with M*2==N.
So instead of
int Array[N]; /* all the theoretical elements */
I define
int Array[M]; /* only the actually used elements */
Of course I cannot access any of the elements which are not needed and it will not really be necessary.
But for the logic of my program, I want to be able to program as if I could access them, but be sure that they will always every only read 0 and ignore any written value.
So what I do is wrapping all accesses to the array.
int GetArray(int index)
{
if (index & 1)
{
/* odd, I need to really access the array,
but at a calculated index */
return Array[index/2];
} else
{
/* even, always 0 */
return 0;
}
}
void SetArray(int index, int value)
{
if (index & 1)
{
/* odd, I need to really access the array,
but at a calculated index */ */
Array[index/2] = value;
} else
{
/* even, no need to store anything, stays always "0" */
}
}
So I can read and write as if the array were twice as large, but guarantee not to ever use the faked elements.
And by mapping the indices as
actualindex = wantindex / 2
I ensure that I do not access beyond the size of the actually existing array.
Porting this concept now to the more complicated setup you have described is your job. You know all the details, you can test wether everything works.
I recommend to extend GetArray() and SetArray() by checks on the resulting index, to make sure that it is never outside of the actual array.
You can also add all kinds of self checks to verify that all your rules and expectations are met.

Related

Dynamically indexing an array in C

Is it possible to create arrays based of their index as in
int x = 4;
int y = 5;
int someNr = 123;
int foo[x][y] = someNr;
dynamically/on the run, without creating foo[0...3][0...4]?
If not, is there a data structure that allow me to do something similar to this in C?
No.
As written your code make no sense at all. You need foo to be declared somewhere and then you can index into it with foo[x][y] = someNr;. But you cant just make foo spring into existence which is what it looks like you are trying to do.
Either create foo with correct sizes (only you can say what they are) int foo[16][16]; for example or use a different data structure.
In C++ you could do a map<pair<int, int>, int>
Variable Length Arrays
Even if x and y were replaced by constants, you could not initialize the array using the notation shown. You'd need to use:
int fixed[3][4] = { someNr };
or similar (extra braces, perhaps; more values perhaps). You can, however, declare/define variable length arrays (VLA), but you cannot initialize them at all. So, you could write:
int x = 4;
int y = 5;
int someNr = 123;
int foo[x][y];
for (int i = 0; i < x; i++)
{
for (int j = 0; j < y; j++)
foo[i][j] = someNr + i * (x + 1) + j;
}
Obviously, you can't use x and y as indexes without writing (or reading) outside the bounds of the array. The onus is on you to ensure that there is enough space on the stack for the values chosen as the limits on the arrays (it won't be a problem at 3x4; it might be at 300x400 though, and will be at 3000x4000). You can also use dynamic allocation of VLAs to handle bigger matrices.
VLA support is mandatory in C99, optional in C11 and C18, and non-existent in strict C90.
Sparse arrays
If what you want is 'sparse array support', there is no built-in facility in C that will assist you. You have to devise (or find) code that will handle that for you. It can certainly be done; Fortran programmers used to have to do it quite often in the bad old days when megabytes of memory were a luxury and MIPS meant millions of instruction per second and people were happy when their computer could do double-digit MIPS (and the Fortran 90 standard was still years in the future).
You'll need to devise a structure and a set of functions to handle the sparse array. You will probably need to decide whether you have values in every row, or whether you only record the data in some rows. You'll need a function to assign a value to a cell, and another to retrieve the value from a cell. You'll need to think what the value is when there is no explicit entry. (The thinking probably isn't hard. The default value is usually zero, but an infinity or a NaN (not a number) might be appropriate, depending on context.) You'd also need a function to allocate the base structure (would you specify the maximum sizes?) and another to release it.
Most efficient way to create a dynamic index of an array is to create an empty array of the same data type that the array to index is holding.
Let's imagine we are using integers in sake of simplicity. You can then stretch the concept to any other data type.
The ideal index depth will depend on the length of the data to index and will be somewhere close to the length of the data.
Let's say you have 1 million 64 bit integers in the array to index.
First of all you should order the data and eliminate duplicates. That's something easy to achieve by using qsort() (the quick sort C built in function) and some remove duplicate function such as
uint64_t remove_dupes(char *unord_arr, char *ord_arr, uint64_t arr_size)
{
uint64_t i, j=0;
for (i=1;i<arr_size;i++)
{
if ( strcmp(unord_arr[i], unord_arr[i-1]) != 0 ){
strcpy(ord_arr[j],unord_arr[i-1]);
j++;
}
if ( i == arr_size-1 ){
strcpy(ord_arr[j],unord_arr[i]);
j++;
}
}
return j;
}
Adapt the code above to your needs, you should free() the unordered array when the function finishes ordering it to the ordered array. The function above is very fast, it will return zero entries when the array to order contains one element, but that's probably something you can live with.
Once the data is ordered and unique, create an index with a length close to that of the data. It does not need to be of an exact length, although pledging to powers of 10 will make everything easier, in case of integers.
uint64_t* idx = calloc(pow(10, indexdepth), sizeof(uint64_t));
This will create an empty index array.
Then populate the index. Traverse your array to index just once and every time you detect a change in the number of significant figures (same as index depth) to the left add the position where that new number was detected.
If you choose an indexdepth of 2 you will have 10² = 100 possible values in your index, typically going from 0 to 99.
When you detect that some number starts by 10 (103456), you add an entry to the index, let's say that 103456 was detected at position 733, your index entry would be:
index[10] = 733;
Next entry begining by 11 should be added in the next index slot, let's say that first number beginning by 11 is found at position 2023
index[11] = 2023;
And so on.
When you later need to find some number in your original array storing 1 million entries, you don't have to iterate the whole array, you just need to check where in your index the first number starting by the first two significant digits is stored. Entry index[10] tells you where the first number starting by 10 is stored. You can then iterate forward until you find your match.
In my example I employed a small index, thus the average number of iterations that you will need to perform will be 1000000/100 = 10000
If you enlarge your index to somewhere close the length of the data the number of iterations will tend to 1, making any search blazing fast.
What I like to do is to create some simple algorithm that tells me what's the ideal depth of the index after knowing the type and length of the data to index.
Please, note that in the example that I have posed, 64 bit numbers are indexed by their first index depth significant figures, thus 10 and 100001 will be stored in the same index segment. That's not a problem on its own, nonetheless each master has his small book of secrets. Treating numbers as a fixed length hexadecimal string can help keeping a strict numerical order.
You don't have to change the base though, you could consider 10 to be 0000010 to keep it in the 00 index segment and keep base 10 numbers ordered, using different numerical bases is nonetheless trivial in C, which is of great help for this task.
As you make your index depth become larger, the amount of entries per index segment will be reduced
Please, do note that programming, especially lower level like C consists in comprehending the tradeof between CPU cycles and memory use in great part.
Creating the proposed index is a way to reduce the number of CPU cycles required to locate a value at the cost of using more memory as the index becomes larger. This is nonetheless the way to go nowadays, as masive amounts of memory are cheap.
As SSDs' speed become closer to that of RAM, using files to store indexes is to be taken on account. Nevertheless modern OSs tend to load in RAM as much as they can, thus using files would end up in something similar from a performance point of view.

Most efficient way to check if elements in an array have changed

I have an array in c and I need to perform some operation only if the elements in an array have changed. However the time and memory taken for this is very important. I realized that an efficient way to do this would probably be to hash all the elements of the array and compare the result with the previous result. If they match that means the elements dont change. I would however like to know if this is the most efficient way of doing things. Also since the array is only 8 bytes long(1 byte for each element) which hashing function would be least time consuming?
The elements in an array are actually being received from another microcontroller. So they may or may not change depending on whether what the other micro-controller measured is the same or not
If you weren't tied to a simple array, you could create a "MRU" List of structures where the structure could contain a flag that indicates if the item was changed since it was last inspected.
Every time an item changes set the "changed flag" and move it to the head of the list. When you need to check for the changed items you traverse the list from the head and unset the changed flags and stopping at the first element with its change flag not set.
Sorry, I missed the part about the array being only 8 bytes long. With that info and with the new info from your edit, I'm thinking the previous suggestion is not ideal.
If the array is only 8-bytes long why not just cache a copy of the previous array and compare it to the new array received?
Below is a clarification of my comment about "shortcutting" the compares. How you implement this would depend on what the sizeof(int) is on the platform used.
Using a 64-bit integer you could get away with one compare to determine if the array has changed. For example:
#define ARR_SIZE 8
unsigned char cachedArr[ARR_SIZE];
unsigned char targetArr[ARR_SIZE];
unsigned int *ic = (unsigned int *)cachedArr;
unsigned int *it = (unsigned int *)targetArr;
// This assertion needs to be true for this implementation to work
// correctly.
assert(sizeof(int) == sizeof(cachedArr));
/*
** ...
** assume initialization and other suff here
** leading into the main loop that is receiving the target array data.
** ...
*/
if (*ic != *it)
{
// Target array has changed; find out which element(s) changed.
// If you only cared that there was a change and did not care
// to know which specific element(s) had changed you could forego
// this loop altogether.
for (int i = 0; i < ARR_SIZE; i++)
{
if (cachedArr[i] != targetArr[i])
{
// Do whatever needs to be done based on the i'th element
// changed
}
}
// Cache the array again since it has changed.
memcpy(cachedArr, targetArr, sizeof(cachedArr));
}
// else no change to the array
If the native integer size was smaller than 64-bit you could use the same theory, but you'd have to loop over the array sizeof(cachedArr) / sizeof(unsigned int) times; and there would be a worst-case scenario involved (but isn't there always) if the change was in the last chunk tested.
It should be noted that with doing any char to integer type casting you may need to take into consideration alignment (if the char data is aligned to the appropriate word-size boundary).
Thinking further upon this however, it might be better altogether to just unroll the loop yourself and do:
if (cachedArr[0] != targetArr[0])
{
doElement0ChangedWork();
}
if (cachedArr[1] != targetArr[1])
{
doElement1ChangedWork();
}
if (cachedArr[2] != targetArr[2])
{
doElement2ChangedWork();
}
if (cachedArr[3] != targetArr[3])
{
doElement3ChangedWork();
}
if (cachedArr[4] != targetArr[4])
{
doElement4ChangedWork();
}
if (cachedArr[5] != targetArr[5])
{
doElement5ChangedWork();
}
if (cachedArr[6] != targetArr[6])
{
doElement6ChangedWork();
}
if (cachedArr[7] != targetArr[7])
{
doElement7ChangedWork();
}
Again, depending on whether or not knowing which specific element(s) changed that could be tightened up. This would result in more instruction memory needed but eliminates the loop overhead (the good old memory versus speed trade-off).
As with anything time/memory related test, measure, compare, tweak and repeat until desired results are achieved.
only if the elements in an array have changed
Who else but you is going to change them? You can just keep track of whether you've made a change since the last time you did the operation.
If you don't want to do that (perhaps because it'd require recording changes in too many places, or because the record-keeping would take too much time, or because another thread or other hardware is messing with the array), just save the old contents of the array in a separate array. It's only 8 bytes. When you want to see whether anything has changed, compare the current array to the copy element-by-element.
As others have said, the elements will only change if the code changed them.
Maybe this data can be changed by another user? Otherwise you would know that you had changed an entry.
As far as the hash function, there are only 2^8 = 256 different values that this array can take. A hash function won't really help here. Also, a hash function has to be computed, which costs memory so I don't think that will work for your application.
I would just compare bits until you find one has changed. If one has changed, the you will check 4 bits on average before you that your array has changed (assuming that each bit is equally likely to change).
If one hasn't changed, that is worst case scenario and you will have to check all eight bits to conclude that none have changed.
If array only 8 bytes long, you can treat it as if it is a long long type number. Suppose original array is char data[8].
long long * pData = (logn long *)data;
long long olddata = *pData;
if ( olddata != *pData )
{
// detect which one changed
}
I mean, this way you operate all data in one shot, this is much faster than access each element using index. hash is slower n this case.
If it is byte oriented with only eight elements, doing an XOR function would be more efficient than any other comparison.
If ((LocalArray[0] ^ received Array [0]) & (LocalArray[1] ^ received Array [1]) & ...)
{
//Yes it is changed
}

Remove element from c array

I need to remove a specific element from an array, that array is dynamically resized in order to store an unknown number of elements with realloc.
To control the allocated memory and defined elements, I have two other variables:
double *arr = NULL;
int elements = 0;
int allocated = 0;
After some elements being placed in the array, I may need to remove some of them. All texts that I've found says to use memmove and reduce the variables by the number of elements removed.
My doubt is if this method is secure and efficient.
I think this is the most efficient function you can use (memcpy is not an option) regarding secured - you will need to make sure that the parameters are OK, otherwise bad things will happen :)
Using memmove is certainly efficient, and not significantly less secure than iterating over the array. To know how secure the implementation actually is, we'd need to see the code, specifically the memmove call and how return results from realloc are being checked.
If you get your memmove wrong, or don't check any realloc returns, expect a crash.
In principle, assuming you calculate your addresses and lengths correctly, you can use memmove, but note that if you overwrite one or more elements with the elements at higher indexes, and these overwritten elements were structs that contained pointers to allocated memory, you could produce leaks.
IOW, you must first take care of properly disposing the elements you are overwriting before you can use memmove. How you dispose them depends on what they represent. If they are merely structs that contain pointers into other structures, but they don't "own" the allocated memory, nothing happens. If the pointers "own" the memory, it must be deallocated first.
The performance of memmove() and realloc() can be increased by data partitioning. By data partitioning I mean to use multiple array chunk rather than one big array.
Apart from memmove(), I found memory swaping is efficient way. But there is drawback. The array order may be changed in this way.
int remove_element(int*from, int total, int index) {
if(index != (total-1))
from[index] = from[total-1];
return total-1; // return the number of elements
}
Interestingly array is randomly accessible by the index. And removing randomly an element may impact the indexes of other elements as well. If this remove is done in a loop traversal on the array, then the reordering may case unexpected results.
One way to fix that is to use a is_blank mask array and defer removal.
int remove_element(int*from, int total, int*is_valid, int index) {
is_blank[index] = 1;
return total; // **DO NOT DECREASE** the total here
}
It may create a sparse array. But it is also possible to fill it up as new elements are added in the blank positions.
Again, it is possible to make the array compact in the following efficient swap algorithm.
int sparse_to_compact(int*arr, int total, int*is_valid) {
int i = 0;
int last = total - 1;
// trim the last blank elements
for(; last >= 0 && is_blank[last]; last--); // trim blank elements from last
// now we keep swapping the blank with last valid element
for(i=0; i < last; i++) {
if(!is_blank[i])
continue;
arr[i] = arr[last]; // swap blank with the last valid
last--;
for(; last >= 0 && is_blank[last]; last--); // trim blank elements
}
return last+1; // return the compact length of the array
}
Note that the algorithm above uses swap and it changes the element order. May be it is preferred/safe to be used outside of some loop operation on the array. And if the indices of the elements are saved somewhere, they need to be updated/rebuilt as well.

changing the index of array

so far, i m working on the array with 0th location but m in need to change it from 0 to 1 such that if earlier it started for 0 to n-1 then now it should start form 1 to n. is there any way out to resolve this problem?
C arrays are zero-based and always will be. I strongly suggest sticking with that convention. If you really need to treat the first element as having index 1 instead of 0, you can wrap accesses to that array in a function that does the translation for you.
Why do you need to do this? What problem are you trying to solve?
Array indexing starts at zero in C; you cannot change that.
If you've specific requirements/design scenarios that makes sense to start indexes at one, declare the array to be of length n + 1, and just don't use the zeroth position.
Subtract 1 from the index every time you access the array to achieve "fake 1-based" indexing.
If you want to change the numbering while the program is running, you're asking for something more than just a regular array. If things only ever shift by one position, then allocate (n+1) slots and use a pointer into the array.
enum { array_size = 1000 };
int padded_array[ array_size + 1 ];
int *shiftable_array = padded_array; /* define pointer */
shiftable_array[3] = 5; /* pointer can be used as array */
some_function( shiftable_array );
/* now we want to renumber so element 1 is the new element 0 */
++ shiftable_array; /* accomplished by altering the pointer */
some_function( shiftable_array ); /* function uses new numbering */
If the shift-by-one operation is repeated indefinitely, you might need to implement a circular buffer.
You can't.
Well in fact you can, but you have to tweak a bit. Define an array, and then use a pointer to before the first element. Then you can use indexes 1 to n from this pointer.
int array[12];
int *array_starts_at_one = &array[-1]; // Don't use index 0 on this one
array_starts_at_one[1] = 1;
array_starts_at_one[12] = 12;
But I would advise against doing this.
Some more arguments why arrays are zero based can be found here. Infact its one of the very important and good features of the C programming language. However you can implement a array and start indexing from 1, but that will really take a lot of effort to keep track off.
Say you declare a integer array
int a[10];
for(i=1;i<10;i++)
a[i]=i*i;
You need to access all arrays with the index 1. Ofcourse you need to declare with the size (REQUIRED_SIZE_NORMALLY+1).
You should also note here that you can still access the a[0] element but you have to ignore it from your head and your code to achieve what you want to.
Another problem would be for the person reading your code. He would go nuts trying to figure out why did the numbering start from 1 and was the 0th index used for some hidden purpose which unfortunately he is unaware of.

Ideal data structure for mapping integers to integers?

I won't go into details, but I'm attempting to implement an algorithm similar to the Boyer-Moore-Horspool algorithm, only using hex color values instead of characters (i.e., there is a much greater range).
Following the example on Wikipedia, I originally had this:
size_t jump_table[0xFFFFFF + 1];
memset(jump_table, default_value, sizeof(jump_table);
However, 0xFFFFFF is obviously a huge number and this quickly causes C to seg-fault (but not stack-overflow, disappointingly).
Basically, what I need is an efficient associative array mapping integers to integers. I was considering using a hash table, but having a malloc'd struct for each entry just seems overkill to me (I also do not need hashes generated, as each key is a unique integer and there can be no duplicate entries).
Does anyone have any alternatives to suggest? Am I being overly pragmatic about this?
Update
For those interested, I ended up using a hash table via the uthash library.
0xffffff is rather too large to put on the stack on most systems, but you absolutely can malloc a buffer of that size (at least on current computers; not so much on a smartphone). Whether or not you should do it for this task is a separate issue.
Edit: Based on the comment, if you expect the common case to have a relatively small number of entries other than the "this color doesn't appear in the input" skip value, you should probably just go ahead and use a hash map (obviously only storing values that actually appear in the input).
(ignore earlier discussion of other data structures, which was based on an incorrect recollection of the algorithm under discussion -- you want to use a hash table)
If the array you were going to make (of size 0xFFFFFF) was going to be sparse you could try making a smaller array to act as a simple hash table, with the size being 0xFFFFFF / N and the hash function being hexValue / N (or hexValue % (0xFFFFFF / N)). You'll have to be creative to handle collisions though.
This is the only way I can foresee getting out of mallocing structs.
You can malloc(3) 0xFFFFFF blocks of size_t on the heap (for simplicity), and address them as you do with an array.
As for the stack overflow. Basically the program receives a SIGSEGV, which can be a result of a stack overflow or accessing illegal memory or writing on a read-only segment etc... They are all abstracted under the same error message "Segmentation fault".
But why don't you use a higher level language like python that supports associate arrays?
At possibly the cost of some speed, you could try modifying the algorithm to find only matches that are aligned to some boundary (every three or four symbols), then perform the search at byte level.
You could create a sparse array of sorts which has "pages" like this (this example uses 256 "pages", so the upper most byte is the page number):
int *pages[256];
/* call this first to make sure all of the pages start out NULL! */
void init_pages(void) {
for(i = 0; i < 256; ++i) {
pages[i] = NULL;
}
}
int get_value(int index) {
if(pages[index / 0x10000] == NULL) {
pages[index / 0x10000] = calloc(0x10000, 1); /* calloc so it will zero it out */
}
return pages[index / 0x10000][index % 0x10000];
}
void set_value(int index, int value) {
if(pages[index / 0x10000] == NULL) {
pages[index / 0x10000] = calloc(0x10000, 1); /* calloc so it will zero it out */
}
pages[index / 0x10000][index % 0x10000] = value;
}
this will allocate a page the first time it is touched, read or write.
To avoid the overhead of malloc you can use a hashtable where the entries in the table are your structs, assuming they are small. In your case a pair of integers should suffice, with a special value to indicate emptyness of the slot in the table.
How many values are there in your output space, i.e. how many different values do you map to in the range 0-0xFFFFF?
Using randomized universal hashing you can come up with a collision-free hash function with a table no bigger than 2 times the number of values in your output space (for a static table)

Resources