How to deal with large sparse data file (1 & 0) while inputing to AMPL - sparse-matrix

I have a an optimization model with a three dimensional parameter matrix with binary values. >50% of this matrix is 0. While reading the .dat file itself, my 4GB memory is getting used. Using a bigger RAM is not desirable.
param p_ijk{A,B,C};
How to deal with this. Is there a way to index the values with value 1 and input it to AMPL?

You can specify a default value of 0 when declaring parameter
param p_ijk{A,B,C} default 0;
and provide only nonzero values for it in the data. This way zeros won't be stored saving some memory.

Related

Look up table with fixed sized bit arrays as keys

I have a series of fixed size arrays of binary values (individuals from a genetic algorithm) that I would like to associate with a floating point value (fitness value). Such look up table would have a fairly large size constrained by available memory. Due to the nature of the keys is there a hash function that would guarantee no collisions? I tried a few things but they result in collisions. What other data structure could I use to build this look up system?
To answer your questions:
There is no hash function that guarantees no collisions unless you make a hash function that encodes completely the bit array, meaning that given the hash you can reconstruct the bit array. This type of function would be a compression function. If your arrays have a lot of redundant information (for example most of the values are zeros), compressing them could be useful to reduce the total size of the lookup table.
A question on compressing bit array in C is answered here: Compressing a sparse bit array
Since you have most of the bits set to zero, the easiest solution would be to just write a function that converts your bit array in an integer array that keeps track of the positions of the bits that are set to '1'. Then write a function that does the opposite if you need the bit array again. You can save in the hashmap only the encoded array.
Another option to reduce the total size of the lookup table is to erase the old values. Since you are using a genetic algorithm, the population should change over time and old values should become useless, you could periodically remove the older values from the lookup table.

Sparse Multidimensional Array taking huge space - HashTable better?

Is there a better approach to use of Multidimensional Arrays to compute values to be displayed in a table. Please note each of the dimension of the array is huge but is sparse. Can something like a HashTable be considered?
Output Table after the computation looks like this
This answer is outdated, because the OP added the information, that the data is a sparse matrix
Not really. Maybe a one dimensional array (would save the pointers to the dimensions - but that's err... pointless).
An array is the data structure with the fewest metadata (because there is no metadata at all). So your approach can't be optimized much, if you really need to store all that data in memory.
Any other data structure (tree, linked lists, etc.) would contain extra metadata and would therefore consume more memory.
The only way for you to use less memory is to actually use less memory (by only loading the data into memory you really need and leaving the rest on your hard drive or whatever).
You want to display a table, so maybe you can limit the rows you save in memory to an area slightly bigger than the viewport of your table (so you can scroll through the table fluently). Then you can dynamically compute and overwrite the rows according to the scroll state of your table.
There are a number of different ways to manage memory for a sparse matrix. I would start by defining a struct to hold an individual entry in your matrix
struct sparse_matrix_data{
int i;
int j;
int /* or double or whatever */ value;
};
so that you would store the two indices and the value for each non-zero entry. From there, you need to decide what data structure works best for the computations you need to do: hash table on one or both indices, array of these structs, linked list, ...
Note that this will only decrease the memory required if the additional memory required to store the indices is less than the memory you used to store the zeros in your original multidimensional array.

signrank test in a three-dimensional array in MATLAB

I have a 60x60x35 array and would like to calculate the Wilcoxon signed rank test to calculate if the median for each element value across the third array dimension (i.e. with 35 values) is different from zero. Thus, I would like my results in two 60x60 arrays - with values of 0 and 1 depending on the test statistic, and in a separate array with corresponding p values.
The problem I am facing is specifying the command in a way that desired output would have appropriate dimensions and would be calculated across the appropriate dimension of the array.
Thanks for your help and all the best!
So one way to solve your problem is using a nested for-loop. Lets say your data is stored in data:
data=rand(60,60,35);
size_data=size(data);
p=zeros(size_data(1),size_data(2));
p(:,:)=NaN;
h=zeros(size_data(1),size_data(2));
h(:,:)=NaN;
for k=1:size_data(1)
for l=1:size_data(2)
tmp_data=data(k,l,:);
tmp_data=reshape(tmp_data,1,numel(tmp_data));
[p(k,l), h(k,l)]=signrank(tmp_data);
end
end
What I am doing is I preallocate the memory of p,h as a 60x60 matrix. Then I set them to NaN, so if you can easily see if sth went wrong (0 would be an acceptable result). Now I loop over all elements and store the actual data array in a new variable. signrank needs the data to be an array so I reshape it to two dimensions.
I guess you could skip those loops by using bsxfun

packing two single precision values into one

I'm in labview working with very constrained ram. I have two arrays that require single precision since I need decimal points. However, single precision takes too much space for what I have, the decimal values I work with are within 0.00-1000.00.
Is there an intuitive way to pack these two arrays together so I can save some space? Or is there a different approach I can take?
If you need to represent 0.00 - 1000.00, you've got 100000 values. That cannot be represented in less than 17 (whole) bits. That means that to fit two numbers in, you'll need 34 bits. 34 bits is obviously more than you can fit in a 32 bit space. I suggest you try to limit your space of values. You could dedicate 11 bits to the integer value (0 - 1023) and 5 bits to the decimal value (0 to 0.96875 in chunks of 1/32 or 0.03125). Then you'll be able to fit two decimal values into one 32 bit space.
Just remember the extra bit manipulation you have to do for this is likely to have a small performance impact on your application.
First of all it would be good general advice to double check you've correctly understood how LabVIEW stores data in memory and whether any of your VI's are using more memory than they need to.
If you still need to squeeze this data into the minimum space, you could do something like:
Instead of a 1D array of n values, use a 2D array of ceiling(n/16) x 17 U16's. Each U16 is going to hold one bit from each of 16 of your data values.
To read value m from the array, get the 17 U16's from row m/16 of the array and get bit (m MOD 16) from each U16, then combine them to create the value you need.
To write to the array, get the relevant 17 U16's, replace the relevant bit of each with the bits representing the new value, and replace the changed U16's in the array.
I guess this won't be fast but maybe you can optimise it for the particular operations you need to do on this data.
Alternatively, could you perhaps use some sort of data compression? I imagine that would work best if you can organise the data into 'pages' containing some set number of values. For example, you could take a 1D array of SGL, flatten it to a string, then apply the compression to the string, and store the compressed string in a string array. I believe OpenG includes zip tools, for example.

PostgreSQL Array Structure

What's the layout of postgres array stored in memory? How can I get the real data?
For example, for array[0.1, 0.2, 0.3]::float8[], is the real data (0.1, 0.2, 0.3) stored like a standard c array? Could I use memcpy to copy an existing array? Does the pointer we get use ARR_DATA_PTR refer to the real data?
PostgreSQL uses a mutable C structure - so first n bytes contains a fixed length data and next bytes are used for data. First four bytes holds length, follow number of dimmensions, dataoffset and identification of element's data type - next data should be a bitmap that holds NULLs and after this bitmaps data are serialised.
PostgreSQL array is not compatible with C arrays - resp. in few cases C array is part of PostgreSQL array. ARR_DATA_PTR can or must not to refer to real data. Depends on current state - data should be toasted, detoasted, ...
People usually use a macros and supporting functions when work with pg arrays. There are ways for unpacking to C arrays or iteration over pg array.

Resources