Why is mxCreateNumericMatrix maximum size smaller than system maximum array size? - arrays

I am trying to create a matrix in a MEX function. The following works:
uint64_t N;
N = 2147483647; // N = 2*2^30 -1
plhs[0] = mxCreateNumericMatrix(N,1,mxUINT8_CLASS,mxREAL);
However, I am unable to create an array that is this size:
uint64_t N;
N = 2147483648; // N = 2*2^30
plhs[0] = mxCreateNumericMatrix(N,1,mxUINT8_CLASS,mxREAL);
The preceding code throws the error:
maximum variable size allowed by the function exceeded
Which is confusing since my system (64-bit Linux running 64-bit Matlab 2010b) tells me the maximum array size is, in fact, very large.
[~,M] = computer
M =
281474976710655 % 2^48 -1 for those of you keeping track
Furthermore, from the command line, I am able to create very large arrays, and have been quite happily for some time, with calls like the following:
a = zeros(16*2^30,1,'uint8');
disp(uint64(numel(a)))
17179869184
My question is, why am I not able to create arrays in my mex function that I am clearly able to create from the command line, or from other *.m functions?
Thank you.
P.S. - I have also asked this question in the Mathworks forum. I figured I'd cast as large a net as possible. If it is answered there first, I'll post it here.

The answer lies in the compiler options. By default, Matlab limits the size to 2^31-1. To increase the size, the following option must be included in your mex compile command.
mex -largeArrayDims myFunction.c

Related

Error when using CVX package with large sparse matrix

The error description is as follows:
Error using full request 68813x68813 (35.3GB) array to exceed the preset maximum array size Creating an array larger than this limit can take a long time and result in no response from MATLAB for more information, see Array Size Limits or Default Items panel.
Error schurmat_sblk (line 35)
if issparse(schur); schur = full(schur); end;
The function file schurmat_sblk is a file in cvx\sdpt3\Solver,
How can I do to avoid this error?
My cvx codes are as follows:
The value you may need are: n=8; d=2^n;m=(d^2)*0.05;the size of Pauli is m*d^2, it's a sparse matrix. The size of y is m*1;
function [rhoE] = test_compressed_cc(n,~,m,Pauli,y)
d = 2^n;
cvx_begin sdp quiet
% how to define the variable ?
variable rhoE(d,d) hermitian;
rhoE == hermitian_semidefinite(d);
% ||x||_tr=tr(sqrt(x^\daggerx))=tr(sqrt(x^2))=Tr(x)
minimize(trace(rhoE));
subject to
(d/m)*(Pauli * vec(rhoE)) == y;
rhoE >= 0;
cvx_end
end
On the other hand, maybe CVX can't solve the 8 qubit case, does anyone know how SVT should be used to solve this convex program.
Paper link: https://arxiv.org/pdf/0909.3304.pdf
Welcome any comment : )

Need suggestion on Code conversion to Matlab_extension 2

This is an extension of the previously asked question: link. In a short, I am trying to convert a C program into Matlab and looking for your suggestion to improve the code as the code is not giving the correct output. Did I convert xor the best way possible?
C Code:
void rc4(char *key, char *data){
://Other parts of the program
:
:
i = j = 0;
int k;
for (k=0;k<strlen(data);k++){
:
:
has[k] = data[k]^S[(S[i]+S[j]) %256];
}
int main()
{
char key[] = "Key";
char sdata[] = "Data";
rc4(key,sdata);
}
Matlab code:
function has = rc4(key, data)
://Other parts of the program
:
:
i=0; j=0;
for k=0:length(data)-1
:
:
out(k+1) = S(mod(S(i+1)+S(j+1), 256)+1);
v(k+1)=double(data(k+1))-48;
C = bitxor(v,out);
data_show =dec2hex(C);
has = data_show;
end
It looks like you're doing bitwise XOR on 64-bit doubles. [Edit: or not, seems I forgot bitxor() will do an implicit conversion to integer - still, an implicit conversion may not always do what you expect, so my point remains, plus it's far more efficient to store 8-bit integer data in the appropriate type rather than double]
To replicate the C code, if key, data, out and S are not already the correct type you can either convert them explicitly - with e.g. key = int8(key) - or if they're being read from a file even better to use the precision argument to fread() to create them as the correct type in the first place. If this is in fact already happening in the not-shown code then you simply need to remove the conversion to double and let v be int8 as well.
Second, k is being used incorrectly - Matlab arrays are 1-indexed so either k needs to loop over 1:length(data) or (if the zero-based value of k is used as i and j are) then you need to index data by k+1.
(side note: who is x and where did he come from?)
Third, you appear to be constructing v as an array the same size of data - if this is correct then you should take the bitxor() and following lines outside the loop. Since they work on entire arrays you're needlessly repeating this every iteration instead of doing it just once at the end when the arrays are full.
As a general aside, since converting C code to Matlab code can sometimes be tricky (and converting C code to efficient Matlab code very much more so), if it's purely a case of wanting to use some existing non-trivial C code from within Matlab then it's often far easier to just wrap it in a MEX function. Of course if it's more of a programming exercise or way to explore the algorithm, then the pain of converting it, trying to vectorise it well, etc. is worthwhile and, dare I say it, (eventually) fun.

C initializing a (very) large integer array with values corresponding to index

Edit3: Optimized by limiting the initialization of the array to only odd numbers. Thank you #Ronnie !
Edit2: Thank you all, seems as if there's nothing more I can do for this.
Edit: I know Python and Haskell are implemented in other languages and more or less perform the same operation I have bellow, and that the complied C code will beat them out any day. I'm just wondering if standard C (or any libraries) have built-in functions for doing this faster.
I'm implementing a prime sieve in C using Eratosthenes' algorithm and need to initialize an integer array of arbitrary size n from 0 to n. I know that in Python you could do:
integer_array = range(n)
and that's it. Or in Haskell:
integer_array = [1..n]
However, I can't seem to find an analogous method implemented in C. The solution I've come up with initializes the array and then iterates over it, assigning each value to the index at that point, but it feels incredibly inefficient.
int init_array()
{
/*
* assigning upper_limit manually in function for now, will expand to take value for
* upper_limit from the command line later.
*/
int upper_limit = 100000000;
int size = floor(upper_limit / 2) + 1;
int *int_array = malloc(sizeof(int) * size);
// debug macro, basically replaces assert(), disregard.
check(int_array != NULL, "Memory allocation error");
int_array[0] = 0;
int_array[1] = 2;
int i;
for(i = 2; i < size; i++) {
int_array[i] = (i * 2) - 1;
}
// checking some arbitrary point in the array to make sure it assigned properly.
// the value at any index 'i' should equal (i * 2) - 1 for i >= 2
printf("%d\n", int_array[1000]); // should equal 1999
printf("%d\n", int_array[size-1]); // should equal 99999999
free(int_array);
return 0;
error:
return -1;
}
Is there a better way to do this? (no, apparently there's not!)
The solution I've come up with initializes the array and then iterates over it, assigning each value to the index at that point, but it feels incredibly inefficient.
You may be able to cut down on the number of lines of code, but I do not think this has anything to do with "efficiency".
While there is only one line of code in Haskell and Python, what happens under the hood is the same thing as your C code does (in the best case; it could perform much worse depending on how it is implemented).
There are standard library functions to fill an array with constant values (and they could conceivably perform better, although I would not bet on that), but this does not apply here.
Here a better algorithm is probably a better bet in terms of optimising the allocation:-
Halve the size int_array_ptr by taking advantage of the fact that
you'll only need to test for odd numbers in the sieve
Run this through some wheel factorisation for numbers 3,5,7 to reduce the subsequent comparisons by 70%+
That should speed things up.

Cuda replacing double for with 2D block

I'm really new to CUDA and have been trying to traverse a 2D array. I have the following code which works as expected on plain C:
for (ty=0;ty<s;ty++){
if (ty+pixY < s && ty+pixY>=0){
for(tx=0;tx<r;tx++){
T[ty/3][tx/3] += (tx+pixX<s && tx+pixX>=0) ?
*(image +M*(ty+pixY)+tx+pixX) * *(filter+fw*(ty%3)+tx%3) : 0;
}
}
}
Maybe I'm getting something wrong but wouldn't this code translate to CUDA as following?
tx = threadIdx.x;
ty = threadIdy.y;
T[ty/3][tx/3] += (tx+pixX<s && tx+pixX>=0) ?
*(image +M*(ty+pixY)+tx+pixX) * *(filter+fw*(ty%3)+tx%3) : 0;
provided I have defined my kernel parameters as dimGrid(1,1,1) and blockDim(r,s,1)
I ask because I'm getting unexpected results. Also if I properly declare and allocate my arrays as 2D cuda arrays instead of just a big 1D array will this help?
Thanks for your help.
Leaving aside whether the array allocation and indexing schemes are correct (I am not sure there is enough information in the post to confirm that), and the fact that integer division and modulo are slow and should be avoided, you have a much more fundamental problem - a memory race.
Multiple threads within the single block you are using will be attempting to read and write to the same entry of T at the same time. CUDA makes no guarantees about the correctness of this sort of operation and it is almost certainly not going to work. The simplest alternative is to only use a single thread to compute each T[][] entry, rather than three threads. This eliminates the memory race.

How can I efficiently convert a large decimal array into a binary array in MATLAB?

Here's the code I am using now, where decimal1 is an array of decimal values, and B is the number of bits in binary for each value:
for (i = 0:1:length(decimal1)-1)
out = dec2binvec(decimal1(i+1),B);
for (j = 0:B-1)
bit_stream(B*i+j+1) = out(B-j);
end
end
The code works, but it takes a long time if the length of the decimal array is large. Is there a more efficient way to do this?
bitstream = zeros(nelem * B,1);
for i = 1:nelem
bitstream((i-1)*B+1:i*B) = fliplr(dec2binvec(decimal1(i),B));
end
I think that should be correct and a lot faster (hope so :) ).
edit:
I think your main problem is that you probably don't preallocate the bit_stream matrix.
I tested both codes for speed and I see that yours is faster than mine (not very much tho), if we both preallocate bitstream, even though I (kinda) vectorized my code.
If we DONT preallocate the bitstream my code is A LOT faster. That happens because your code reallocates the matrix more often than mine.
So, if you know the B upfront, use your code, else use mine (of course both have to be modified a little bit to determine the length at runtime, which is no problem since dec2binvec can be called without the B parameter).
The function DEC2BINVEC from the Data Acquisition Toolbox is very similar to the built-in function DEC2BIN, so some of the alternatives discussed in this question may be of use to you. Here's one option to try, using the function BITGET:
decimal1 = ...; %# Your array of decimal values
B = ...; %# The number of bits to get for each value
nValues = numel(decimal1); %# Number of values in decimal1
bit_stream = zeros(1,nValues*B); %# Initialize bit stream
for iBit = 1:B %# Loop over the bits
bit_stream(iBit:B:end) = bitget(decimal1,B-iBit+1); %# Get the bit values
end
This should give the same results as your sample code, but should be significantly faster.

Resources