There is also a counterpart which is called density array. What does this mean? I have done some search, but didn't get accurate information.
Say you have a structure
struct SomeStruct {
int someField;
int someUselessField;
int anotherUselessField;
};
and an array
struct SomeStruct array[10];
Then if you look at all the someFields in this array, they can be considered an array on their own, but they're not occupying consequent memory cells, so this array is strided. A stride here is sizeof(SomeStruct), i.e. the distance between two consequent elements of the strided array.
A sparse array mentioned here is a more general concept and actually a different one: a strided array doesn't contain zeroes in skipped memory cells, they're just not the part of the array.
Strided array is a generalization of usual (dense) arrays when stride != sizeof(element).
If you want to operate on a subset of a 2D array, you need to know the 'stride' of the array. Suppose you have:
int array[4][5];
and you want to operate on the subset of the elements starting at array[1][1] to array[2,3].
Pictorially, this is the core of the diagram below:
+-----+-----+-----+-----+-----+
| 0,0 | 0,1 | 0,2 | 0,3 | 0,4 |
+-----+=====+=====+=====+-----+
| 1,0 [ 1,1 | 1,2 | 1,3 ] 1,4 |
+-----+=====+=====+=====+-----+
| 2,0 [ 2,1 | 2,2 | 2,3 ] 2,4 |
+-----+=====+=====+=====+-----+
| 3,0 | 3,1 | 3,2 | 3,3 | 3,4 |
+-----+-----+-----+-----+-----+
To access the subset of the array in a function accurately, you need to tell the called function the stride of the array:
int summer(int *array, int rows, int cols, int stride)
{
int sum = 0;
for (int i = 0; i < rows; i++)
for (int j = 0; j < cols; j++)
sum += array[i * stride + j];
return(sum);
}
and the call:
int sum = summer(&array[1][1], 2, 3, 5);
To stride is to "take long steps"
thefreedictionary.com/stride
For an array this would mean that only some of the elements are present, like just every 10th element. You can then save space by not storing the empty elements in between.
A dense array would be one where many, if not all, elements are present so there is no empty space between the elements.
I'm adding yet another answer here since I didn't find any of the existing ones satisfactory.
Wikipedia explains the concept of stride, and also writes that “stride cannot be smaller than element size (it would mean that elements are overlapping) but can be larger (indicating extra space between elements)”.
However, from the information I've found, strided arrays allow for exactly this: conserve memory by allowing the stride to be zero or negative.
Strided arrays
Compiling APL to JavaScript explains strided arrays as a way to represent multidimensional arrays with both data and stride, unlike the typical "rectangular" representation of arrays that assumes an implicit stride of 1. It allows both positive, negative and zero stride. Why? It allows for many operations to only alter the stride and shape, and not the underlying data, thus allowing efficient manipulation of large arrays.
The advantage of this strided representation becomes apparent when working with large volumes of data. Functions like transpose (⍉⍵), reverse (⌽⍵), or drop (⍺↓⍵) can reuse the data array and only care to give a new shape, stride, and offset to their result. A reshaped scalar, e.g. 1000000⍴0, can only occupy a constant amount of memory, exploiting the fact that strides can be 0.
I haven't worked out exactly how these operations would be implemented as operations on the stride and shape, but it's easy to see that altering only these instead of the underlying data would be much cheaper in terms of computation. However, it's worth keeping in mind that a strided representation might impact cache locality negatively, so depending on the use case it might be better to use regular rectangular arrays instead.
Possibility 1: Stride describes a buffer array to read an optimized array
When you use a method to
store multidimensional arrays in linear storage. The stride describes the size in each dimension of a buffer which will help you read that array. Image taken from Nd4j (More info about Stride)
Possibility 2 (lower level): Stride is the distance between contiguous members of an array
It means that addresses of items with index 0 and 1 won't be continuous in memory unless you use a unit Stride. A bigger value will have the items more distant in memory.
This is useful at low level (word length optimization, overlapping arrays, cache optimization). Refer to wikipedia.
In highly-optimized code, one reasonably coomon technique is to insert padding into arrays. That means that the Nth logical element no longer is at offset N*sizeof(T). The reason why this is can be an optimization is that some caches are associativity-limited. This means that they can't cache both array[i] and array[j] for some pairs i,j. If an algorithm operating on a dense array would use many of such pairs, inserting some padding might reduce this.
A common case where this happens is in image procesing. An image often has a line width of 512 bytes or another "binary round number", and many image manipulation routines use the 3x3 neighborhood of a pixel. As a result, you can get quite a few cache evictions on some cache architectures. By inserting a "weird" number of fake pixels (e.g. 3) at the end of each line, you change the "stride" and there's less cache interference between adjacent lines.
This is very CPU-specific so there's no general advice here.
Related
Suppose I have n sorted integer arrays (a_1, ..., a_n, there may be duplicated elements in a single array), and T is a threshold value between 0 and 1. I would like to find all pairs of arrays the similarity of which is larger than T. The similarity of array a_j w.r.t. array a_i is defined as follows:
sim(i, j) = intersection(i, j) / length(i)
where intersection(i, j) returns the number of elements shared in a_i and a_j, and length(i) returns the length of array a_i.
I can enumerate all pairs of arrays and compute the similarity value, but this takes too much time for a large n (say n=10^5). Is there any data structure, pruning strategy, or other techniques that can reduce the time cost of this procedure? I'm using Java so the technique should be easily applicable in Java.
There are (n^2 - n)/2 pairs of arrays. If n=10^5, then you have to compute the similarity of 5 billion pairs of arrays. That's going to take some time.
One potential optimization is to shortcut your evaluation of two arrays if it becomes clear that you won't reach T. For example, if T is 0.5, you've examined more than half of the array and haven't found any intersections, then it's clear that that pair of arrays won't meet the threshold. I don't expect this optimization to gain you much.
It might be possible to make some inferences based on prior results. That is, if sim(1,2) = X and sim(1,3) < T, there's probably a value of X (likely would have to be very high) at which you can say definitively that sim(2,3) < T.
need to make an algorithm (formula, function) using AND OR XOR NEG SHIFT NOT etc. which calculates the element of array from an index,
the size of the element is one byte
e.g. element = index^constant where the constant is array[index]^index (previously calculated).
This will work only if the array size is less then 256.
How to make a byte from an index when the index would be bigger then 1 byte.
the same way however there will be duplicates as you got only 256 possible numbers in BYTE so if your array is bigger than 256 there must be duplicates.
to avoid obvious mirroring you can not use monotonic functions for example
value[ix] = ix
is monotnic so it will be saw like shape mirroring the content of array every 256 bytes. To avoiding this you need to combine more stuff together. Its similar to computing own pseudo random generator. The usual approaches are:
modular arithmetics
something like:
value[ix]=( ( c0*ix + c1*ix*ix + c2*ix*ix*ix )%prime )&255
if constants c0,c1,c2 and prime are big enough the output looks random so there will be much less repeated patterns visible in the output ... But you need to use arithmetic of bitwidth that can hold the prime ...
In case you are hitting the upper bounds of your arithmetics bitwidth then you need to use modmul,modpow to avoid overflows. See:
Modular arithmetics and NTT (finite field DFT) optimizations
swapping bits
simply do some math on your ix where you also use ix with swapped bits. That will change the monotonic properties a lot... This approach works best however on cumulative sub-result which is not the case of yours. I would try:
value[ix]=( ix + ((ix<<3)*5) - ((ix>>2)*7) + ((3*ix)^((ix<<4)||(ix>>4))) )&255
playing with constants and operators achieve different results. However with this approach you need to check validity (which I did not!). So render graph for first few values (like 1024) where x axis is ix and y axis is value[ix]. There you should see if the stuff is repeating or even saturated towards some value and if it is change the equation.
for more info see How to seed to generate random numbers?
Of coarse after all this its not possible to get the ix from value[ix] ...
I have an array of 64 bits numbers where each bit represent an index in an array we will call A.
The data in the rows is STATIC and preprocessing time is not important (within reason), the data in the array A varies.
There are about 32 million rows to process, so I am trying to find how to get the fastest execution time.
The process is simple; For example, if I have the following numbers:
Row[0] = ...001110, Result[0] = A[1] | A[2] | A[3]
Row[1] = ...001101, Result[1] = A[0] | A[2] | A[3]
The method I thought about would find the largest bit patterns in the list and store temporary results:
For example, we see the 001100 pattern repeated, so we could save some steps:
...001110 => calculate A[1], A[2], A[3], store B[0] = (A[2] | A[3]), result = A[1] | B[0]
...001101 => calculate A[0], result = A[0] | B[0]
...110001 etc...
Ideally, I could make a tree that tells me how to build every possible combination in use (maybe treating the rows by chunk if it gets too large).
Another example:
...0011010
...1101010
...1100010
...1000110
I can calculate 1100000 and 0001010 one time and reuse the results 2 times, but 0100010 can be reused 3 times.
This is to speed up the multiplication of boolean matrices; I am assuming there are algorithms that already exist.
If there are enough repetitions (from your question I assume there are) so that the caching would speed up things, one way to do this would be use a hash to cache the patterns that have already been calculated.
Something like this (pseudocode):
for(Row)
{
Pattern := Patterns[Row];
if(Exists(Hash(Pattern)))
{
Row_result := Hash(Pattern);
}
else
{
Row_result := Calculate(Pattern);
Hash(Pattern) := Row_result;
}
}
Now back to real life, you've said that the patterns are 64 bit long. If there are a lot of different patterns, the hash memory requirements would be huge. To mitigate this I suggest splitting each pattern into two halves and use two hashes, 1ts one to hash upper half, 2nd to hash lower part of the pattern. Then Row_result := Hash1(Upper_half) | Hash2(Lower_part) Having done this you'll lower memory consumption to a manageable several gigabytes in worst case. You can go further and make 4 hashes to make it even lower.
This solution seems a bit too obvious, so I may be missing something, but ...
If each element in A is identified by a bit in Row, and Row is of type uint64, then it follows that A can have at most 64 elements.
If elements of A are in turn Boolean values, as your examples suggest, why not simply have A as a uint64 as well, rather than an array of bool?
Then you could simply do
Result[i] = (Row[i] & A) != 0;
which would be really hard to beat performance-wise.
I am relatively new to C and am just learning about ways that memory is stored during a program. Can someone please explain why the following code:
int main(int argc, char** argv){
float x[3][4];
printf("%p\n%p\n%p\n%p\n", &(x[0][0]), &(x[2][0]), &(x[2][4]), &(x[3][0]));
return 0;
}
outputs this:
0x7fff5386fc40
0x7fff5386fc60
0x7fff5386fc70
0x7fff5386fc70
Why would the first 3 be different places in memory but the last be the same as the third?
Why is there a gap the size of 20 between the first two, but a gap the size of 10 between the second and third? The distance between &(x[2][0]) and &(x[2][4]) doesn't seem like half the distance between &(x[0][0])and &(x[2][0]).
Thanks in advance.
When you declare an array of size n, the indices range from 0 to n - 1. So x[2][4] and x[3][0] are actually stepping outside the bounds of your arrays.
If you weren't already aware, the multidimensional array you declared is actually an array of arrays.
Your compiler is laying out each array one after the other in memory. So, in memory, your elements are laid out in this order: x[0][0], x[0][1], x[0][2], x[0][3], x[1][0], x[1][1], and so on.
It looks like you already understand how pointers work, so I'll gloss over that. The reason the last two elements are the same is because x[2][4] is out of bounds, so it's referring to the next slot in memory after the end of the x[2] array. That would be the first element of the x[3] array, if there was one, which would be x[3][0].
Now, since x[3][0] refers to an address that you don't have a variable mapping to, it's entirely possible that dereferencing it could cause a segmentation fault. In the context of your program, there just happens to be something stored at 0x7fff5386fc70; in other words, you got lucky.
This is due to pointer arithmetic.
Your array is flat, which means that data are stored in a linear way, each one after the other in memory. First [0][0] then [0][1], etc.
The address of [x][y] is calculated as (x*4+y)*float_size+starting_address.
So the gap between the two first [0][0] and [2][0] is 8*float_size. The difference is 20 in hexadecimal, which is 32 in decimal, float_size is then 4.
In between the second and third you have (2*4+4)-(2*4)*float_size which is 16 in decimal, so 10 in hexadecimal. This is exactly half the size of the previous because it is the size of one row (the size of 4 elements in the third row), and the previous is the size of two rows (the size of 8 elements in the first and second rows).
Arrays are linear data structures. Irrespective of their dimension, say 1-dimensional or 2-dimensional or 3-dimensional, they are linearlly arranged.
Your x[3][4] will be stored in memory as consecutive fixed sized cells like :
| (0,0) | (0, 1) | (0,2) | (0,3) | (1,0) | (1,1) | (1,2) | (1,3) | (2,0) | (2,1) | (2,2) | (2,3) |
This x[0][0] notation is matrix notation. On compile time, it is converted to pointer notation. The calculation is like:
x[i][j] = y * i + j where y in your case is 4.
So on calculating by this way the outputs are perfect.
Array elements in C are stored contiguously, in row-major order. So, in your example, &x[row][column] is exactly equal to &x[0][0]+((row*4)+column))*sizeof(float) (when those addresses are converted to number of bytes, which is what you're outputting).
The third address you're printing has the second index out of bounds (valid values 0 to 3), and the fourth has the first index out of bounds (valid values 0 to 2). It just happens that the values you've chosen work out to the same location in memory, because the rows are laid out in memory end-to-end.
There are 8 elements between &(x[0][0]) and &(x[2][0]). The actual difference in memory is multiplied by sizeof(float) which, for your compiler, is 4. 4*8 is 32 which, when printed as hex, is 0x20, is the difference you're seeing.
If you picked a value of row and column where ((row*4)+column)) was 12(=3*4) or more, your code would be computing the address of something outside the array. Attempting to use such a pointer pointer (e.g. setting the value at that address) would give undefined behaviour. You just got lucky that the indices you picked happen to be within the array.
For my university process I'm simulating a process called random sequential adsorption.
One of the things I have to do involves randomly depositing squares (which cannot overlap) onto a lattice until there is no more room left, repeating the process several times in order to find the average 'jamming' coverage %.
Basically I'm performing operations on a large array of integers, of which 3 possible values exist: 0, 1 and 2. The sites marked with '0' are empty, the sites marked with '1' are full. Initially the array is defined like this:
int i, j;
int n = 1000000000;
int array[n][n];
for(j = 0; j < n; j++)
{
for(i = 0; i < n; i++)
{
array[i][j] = 0;
}
}
Say I want to deposit 5*5 squares randomly on the array (that cannot overlap), so that the squares are represented by '1's. This would be done by choosing the x and y coordinates randomly and then creating a 5*5 square of '1's with the topleft point of the square starting at that point. I would then mark sites near the square as '2's. These represent the sites that are unavailable since depositing a square at those sites would cause it to overlap an existing square. This process would continue until there is no more room left to deposit squares on the array (basically, no more '0's left on the array)
Anyway, to the point. I would like to make this process as efficient as possible, by using bitwise operations. This would be easy if I didn't have to mark sites near the squares. I was wondering whether creating a 2-bit number would be possible, so that I can account for the sites marked with '2'.
Sorry if this sounds really complicated, I just wanted to explain why I want to do this.
You can't create a datatype that is 2-bits in size since it wouldn't be addressable. What you can do is pack several 2-bit numbers into a larger cell:
struct Cell {
a : 2;
b : 2;
c : 2;
d : 2;
};
This specifies that each of the members a, b, c and d should occupy two bits in memory.
EDIT: This is just an example of how to create 2-bit variables, for the actual problem in question the most efficient implementation would probably be to create an array of int and wrap up the bit fiddling in a couple of set/get methods.
Instead of a two-bit array you could use two separate 1-bit arrays. One holds filled squares and one holds adjacent squares (or available squares if this is more efficient).
I'm not really sure that this has any benefit though over packing 2-bit fields into words.
I'd go for byte arrays unless you are really short of memory.
The basic idea
Unfortunately, there is no way to do this in C. You can create arrays of 1 byte, 2 bytes, etc., but you can't create areas of bits.
The best thing you can do, then, is to write a new library for yourself, which makes it look like you're dealing with arrays of 2 bits, but in reality does a lot of hard work. The same way that the string libraries give you functions that work on "strings" (which in C are just arrays), you'll be creating a new library which works on "bit arrays" (which in reality will be arrays of integers, with a few special functions to deal with them as-if they were arrays of bits).
NOTE: If you're new to C, and haven't learned the ideas of "creating a new library/module", or the concept of "abstraction", then I'd recommend learning about them before you continue with this project. Understanding them is IMO more important than optimizing your program to use a little less space.
How to implement this new "library" or module
For your needs, I'd create a new module called "2-bit array", which exports functions for dealing with the 2-bit arrays, as you need them.
It would have a few functions that deal with setting/reading bits, so that you can work with it as if you have an actual array of bits (you'll actually have an array of integers or something, but the module will make it seem like you have an array of bits).
Using this module would like something like this:
// This is just an example of how to use the functions in the twoBitArray library.
twoB my_array = Create2BitArray(size); // This will "create" a twoBitArray and return it.
SetBit(twoB, 5, 1); // Set bit 5 to 1 //
bit b = GetBit(twoB, 5); // Where bit is typedefed to an int by your module.
What the module will actually do is implement all these functions using regular-old arrays of integers.
For example, the function GetBit(), for GetBit(my_arr, 17), will calculate that it's the 1st bit in the 4th integer of your array (depending on sizeof(int), obviously), and you'd return it by using bitwise operations.
You can compact one dimension of array into sub-integer cells. To convert coordinate (lets say x for example) to position inside byte:
byte cell = array[i][ x / 4 ];
byte mask = 0x0004 << (x % 4);
byte data = (cell & mask) >> (x % 4);
to write data do reverse