i have a matlab function that reads a big matrix and calculates the Singular Value Decomposition SVD. I however need to run that on a linux system without needing to install matlab on every new system, so id like to have it converted into c source code. The code is realy simple:
function singular(m)
load c:\som\matlab.txt
[U,S,V]=svd(matlab);
m = str2num(m);
U1=U(:,1:floor(sqrt(m)));
V1=V';
Vt=V1(1:floor(sqrt(m)),:);
S1=S(1:floor(sqrt(m)),1:floor(sqrt(m)));
matlab1=U1*S1*Vt;
matlab2=abs(matlab1);
save c:\som\matlab1.txt matlab1 -ascii
save c:\som\matlab2.txt matlab2 -ascii
You can use the Matlab coder, but I advise you to make it manually, because some functions are not convertible, and the performance is not much better that mading it manually.
To make svd manually: SVD
Related
I am using Intel Fortran with Visual Studio 2008 SP1.
My main question is: I would like to read 2D array from Matlab .mat file into fortran. Also, save the output of Fortran 2D matricies to a preferably .mat file, as currently I can save it to a text file using:
write(unit = #, <linelength>F22.8>),matrixname
This line works, but I am not sure if loose any of my double precision. If I do not loose precision, I can stick to it, otherwise I would need help. And I will only need a way to read from a Matlab file to intel-fortran with keeping the precision. There is no characters in these arrays, they are of numerical values.
I need to conserve the precision, since I am working with spherical functions, and they can be highly divergent.
matlab's internal ".mat" is "maybe" or "maybe not" compressed depending on versions. I think you do not want to use this for portable file transfer. ( Having attempted to find good documentation on the subject I wonder if #HPM was being sarcastic in his comment.. )
A keep it simple approach for a single array is to simply exchange as raw binary.
Example write in matlab:
a=[1. 2. ; 3. 4. ]
fileID = fopen('test.bin','w');
fwrite(fileID,a,'double');
fclose(fileID);
then in fortran
implicit none
double precision a(2,2)
open(unit=100,'test.bin',access='stream',form='unformatted')
read(100)a
note here the data is actually "flat", the reading program needs to know the array dimension. You can of course write the dimensions to the file if you need.
there are of course a number of potential portability issues with binary data, but this will work for most cases assuming you are reading/writing on the same hardware.
I have 64 output binary files from an MPI simulation using a C code.
The files correspond to the output of 64 processes. What would be a way to join all those files into a single file, perhaps using a C script?
Since this was tagged MPI, I'll offer an MPI solution, though it might not be something the questioner can do.
If you are able to modify the simulation, why not adopt an MPI-IO approach? Even better, look into HDF5 or Parallel-NetCDF and get a self-describing file format, platform portability, and a host of analysis and vis tools that already understand your file format.
But no matter which approach you take, the general idea is to use MPI to describe which part of each file belongs to each process. The easiest example is if each process contributes to a 1D array. then for a logically global array of N items, each process contributes 1/N items at offset "myrank/N"
Since all the output files are fairly small and the same size, it would be easy to use MPI_Gather to assemble one large binary array on one node which could then be written to a file. If allocating a large array is an issue, you could simply use MPI_ISend and MPI_Recv to write to the file one piece at at time.
Obviously this is a pretty primitive solution, but it is also very straightforward, foolproof and really won't take notably longer (assuming you're doing all this at the end of your simulation).
I would like to implement some R package, written in C code.
The C code must:
take an array (of any type) as input.
produce array as output (of unpredictable size).
What is the best practice of implementing array passing?
At the moment C code is called with .C(). It accesses array directly from R, through pointer. Unfortunately same can't be done for output, as output dimensions need to be known in advance which is not true in my case.
Would it make sense to pass array from C to R through a file? For example, in ramfs, if using linux?
Update 1:
Here the exact same problem was discussed.
There, possible option of returning array with unknown dimensions was mentioned:
Splitting external function into two, at the point before calculating the array but after dimensions are known. First part would return dimensions, then empty array would be prepared, then second part would run and populate the array in R.
In my case full dimensions are known only once whole code is executed, so this method would mean running C code twice. Taking a guess on maximal array size isn't optional either.
Update 2: It seems only way to do this is to use .Call() instead, as power suggested. Here are few good examples: http://www.sfu.ca/~sblay/R-C-interface.ppt.
Thanks.
What is the best practice of implementing array passing?
Is the package already written in ANSI C? .C() would then be quick and easy.
If you are writing from scratch, I suggest .Call() and Rcpp. In this way, you can pass R objects to your C/C++ code.
Would it make sense to pass array through a file?
No
Read "Writing R Extensions".
I have a large Array/Matrix with 5899091 rows and 11 columns. I am storing it in a text file.
Using dlmread() method in matlab i am reading it everytime i need it. However,it is taking a lot of time(more than 1 minute). And i need to read the file again and again. I got stuck in this situation. My question is:
1) Is there any way to read the file just once and save it in any kind of global/persistent Matrix?
2) Is there a better way to read a text file and convert it into a matrix in a more efficient way?
Thanks in advance.
You might get the performance you want from a memory-mapped file. Investigate the Matlab function memmapfile. It's not something I use much so won't offer any further advice which is likely to be wrong.
The best option is almost certainly to simply read the file once in a script or control function and then pass it as a variable to any subsequent functions which require that data. This is just as much work as adding the global declarations and is cleaner, more maintainable and more flexible.
You can also save the variable to a MAT file. If each element in your file is of type double, it should be a bit over 4GB in size. The MAT format is efficient, but the major benefit is from storing your numbers as numbers instead of text. With 5 or 8 significant digits the same numbers in ASCII take 6.2 or 9.3 GB respectively.
If for some reason you really don't want to pass the data as a variable, I would recommend nested functions over global variables:
function aResult = aFunction(var)
data = dlmread(...);
var4 = bFunction(var);
function bResult = bFunction(var)
var4 = cFunction(data);
end
end
Of course at this point you are still wrapping the business functions in something. The scoping rules are helpful.
Now, if the real problem is just the size of this file - that is, it's too big for memory and you are using range arguments to dlmread to access the file in chunks - then you should probably take the time to design a format for use with memmapfile. This Wikipedia page explains the potential benefits.
Then there is the brute force solution.
You want to use global variables. Declare the global at the top of the function and it will be shared by the functions it is declared in: see http://www.mit.edu/people/abbe/matlab/globals.html
Use a .mat file. It will be slightly faster. Also, if the matrix is easy to create (large identity or eye matrix) it maybe quicker to generate it on the fly. Lastly, if your matrix is sparse use the sparse matrix operations.
You can read the file once and save it to MATLAB's MAT file. Then you can access the saved variables fully or partially (basically as any variable in MATLAB workspace) directly from the file using MATFILE. I have answered a similar question about it here. Please have a look.
I have a small bit of existing C code that I want to wrap using Cython. I want to be able to set up a number of numpy arrays, and then pass those arrays as arguments to the C code whose functions take standard c arrays (1d and 2d). I'm a little stuck in terms of figuring out how to write the proper .pyx code to properly handle things.
There are a handful of functions, but a typical function in the file funcs.h looks something like:
double InnerProduct(double *A, double **coords1, double **coords2, const int len)
I then have a .pyx file that has a corresponding line:
cdef extern from "funcs.h":
double InnerProduct(double *A, double **coords1, double **coords2, int len)
where I got rid of the const because cython doesn't support it. Where I'm stuck is what the wrapper code should then look like to pass a MxN numpy array to the **coords1 and **coords2 arguments.
I've struggled to find the correct documentation or tutorials for this type of problem. Any suggestions would be most appreciated.
You probably want Cython's "typed memoryviews" feature, which you can read about in full gory detail here. This is basically the newer, more unified way to work with numpy or other arrays. These can be exposed in Python-land as numpy arrays, or you can export them to Python (for example, here). You have to pay attention to how the striding works and make sure you're consistent about e.g. C-contiguous vs. FORTRAN-like arrays, but the docs are pretty clear on how to do that.
Without knowing a bit more about your function it's hard to be more concrete on exactly the best way to do this - i.e., is the C function read-only for the arrays? (I think yes based on the signature you gave, but am not 100% sure.) If so you don't need to worry about making copies if needed to get C-contiguous states, because the C function doesn't need to talk back to the Python-level numpy array. But typed memoryviews will let you do any of this with a minimum of fuss.
The cython interface code should be created according to the tutorial given here.
To get a C pointer to the data in a numpy array, you should use the ctypes attribute of the numpy array, which is described here.