I have a small bit of existing C code that I want to wrap using Cython. I want to be able to set up a number of numpy arrays, and then pass those arrays as arguments to the C code whose functions take standard c arrays (1d and 2d). I'm a little stuck in terms of figuring out how to write the proper .pyx code to properly handle things.
There are a handful of functions, but a typical function in the file funcs.h looks something like:
double InnerProduct(double *A, double **coords1, double **coords2, const int len)
I then have a .pyx file that has a corresponding line:
cdef extern from "funcs.h":
double InnerProduct(double *A, double **coords1, double **coords2, int len)
where I got rid of the const because cython doesn't support it. Where I'm stuck is what the wrapper code should then look like to pass a MxN numpy array to the **coords1 and **coords2 arguments.
I've struggled to find the correct documentation or tutorials for this type of problem. Any suggestions would be most appreciated.
You probably want Cython's "typed memoryviews" feature, which you can read about in full gory detail here. This is basically the newer, more unified way to work with numpy or other arrays. These can be exposed in Python-land as numpy arrays, or you can export them to Python (for example, here). You have to pay attention to how the striding works and make sure you're consistent about e.g. C-contiguous vs. FORTRAN-like arrays, but the docs are pretty clear on how to do that.
Without knowing a bit more about your function it's hard to be more concrete on exactly the best way to do this - i.e., is the C function read-only for the arrays? (I think yes based on the signature you gave, but am not 100% sure.) If so you don't need to worry about making copies if needed to get C-contiguous states, because the C function doesn't need to talk back to the Python-level numpy array. But typed memoryviews will let you do any of this with a minimum of fuss.
The cython interface code should be created according to the tutorial given here.
To get a C pointer to the data in a numpy array, you should use the ctypes attribute of the numpy array, which is described here.
Related
I want to transform the below python code in Cython:
x_array = []
x_array.append(x_new)
I tried the following Cython codes but it gives error:
cdef np.ndarray[double, dim=1] x_array
x_array.append(x_new)
The error shows:
Cannot coerce list to type [double, dim=1]
Your options are:
cdef list x_array. This lets Cython know that the type of x_array is actually a list. You may get a small speed-up from this.
Make x_array a numpy array instead. If all the elements in the list are the same simple, numeric type then this is probably a better option. Be aware that appending to numpy arrays is likely to be pretty slow, so you should calculate the size in advance.
cdef np.array[double, dim=1] x_array = np.zeros((some_precomputed_size,))
# or
cdef double[:] x_array = np.zeros((some_precomputed_size,))
Note that this will only give you a speed-up for some types of operations (mostly accessing individual elements in Cython)
If you're set on using Python lists you can sometimes get a speed-up by accessing them through the Python C API in Cython. This answer provides an example of where that worked well. This works best when you know a size in advance and so you can pre-allocate the array (i.e. don't append!) and also avoid some Cython reference counting. It's very easy to go wrong and make reference counting errors with this method, so proceed carefully.
I'm looking to solve a math equation given as string to array of pointers like
char* equations = {"n+1", "n+2", "n*n+3"}
I want the compiler to consider strings inside the above character array as variables e.g "n" is a variable. So, when I assign this string to an 'int' so they will act like a mathematical operation like this:
int a = n+1;
I was thinking the below method could work, but it is definitely not working because we can't assign a pointer's array to int. Even it did, but it's taking just the codes of it like A=65, but this is not my requirement:
a = equations[0]; //(compiler assume it like a = n+1)
The compiler cannot do that for you, you will have to parse the strings into their components (variables, constants, operators) and then apply the appropriate operations yourself.
there are many ways to do this, for example you could parse each expression do some pattern matching and then create and expression from this, which is of course much easier said than done.
But I've found a library that I've not tested yet that do what you want(Or promise to do so), here is the link:
http://partow.net/programming/exprtk/index.html
No, what you want is not possible, because, in a compiled version of a C code, the notion of a "variable name" does not exist.
If you want to achieve this sort of things, you have to do this before you head into the compilation part, i.e, during the pre-processing part.
Otherwise, a more flexible way of achieving what you "probably" want is to make use of function pointers (as "callbacks", if you prefer). You can have different functions defined to do certain jobs and then, at run-time, you can choose any of the already defined functions to be called / invoked and collect the result in the desired variable.
I have a fine tuned algorithm in MATLAB that operates on matrices (ofcourse). I've used matlab coder to generate c code for this algorithm and it works as expected.
Here's a function call that I used in Matlab
x = B/A
wherein
B is of size 1*500 (rows * columns)
A is of size 10*500
x, the result is of size 1*10
When this is converted into C source using Matlab Coder. I noticed that the function definition accepts parameters that are same as above sizes.
void myfunction(const double B[500], const double A[5000], double x[10])
For prototype and testing purposes this seems okay. However, in production I prefer to have this function be used for different sizes too. For example 100 instead of 500 in above mentioned variables should also work. How can I remove dependence of matrix dimensions in my algorithm ?
Additionally, there are few lines of code that use hard coded integers. For example, there is code like
if (rankR <= 1.4903363393874656E-8)
// Some internal function calls
else
// Usage of standard sqrt
or
500.0 * fabs(A[0]) * 2.2204460492503131E-16
Could any one explain what are these hard coded integers ? Are these generated from the test data that I've used in MATLAB ?
If the function call you refer to is the entry-point function, you can define the size when setting up Coder. The simplest way to run the Coder is using the GUI from the 'Apps' menu inside MATLAB (or type 'coder' at the console). After specifying the entry-point function, step 2 is to define the type and size for each of the input variables.
For each dimension of your input variable (there can be more than 2 if necessary), you can specify the:
n - dimension is exactly n long
:n - dimension is up to n long
inf - dimension is unbounded
If the function call is not the entry-point function, and is buried inside your code (or if you are running the codegen function from the console), you can explicitly define variables as being of varying size:
coder.varsize('myVariableName');
Bear in mind that some functions can only be used (with Coder) with fixed-sized inputs.
Fuller description here:
http://uk.mathworks.com/help/fixedpoint/ug/defining-variable-size-data-for-code-generation.html#br9t627
Not sure about the random constants unfortunately.
I have to use MPI API for send/receive matrices in my programs. To send a matrix I used the below syntax:
MPI_Send(matrix, ...) <- USE THIS
MPI_Send(&matrix, ...)
MPI_Send(&matrix[0][0], ...)
Similar to the last one, but untested: MPI_Send(&matrix[0],....
I saw in different examples the use of the above and I wonder if there is a difference between them or is any one going to cause errors on big data? I'm asking because I've heard a person who knows a little bit of HPC talking about using only the first syntax (enumerated above) but I didn't have time to ask him why and I cannot find the reason on the Internet, or I don't know where to look. All of the above works just find on little examples that I compiled but I have only 2 cores on my processor so I may not be able to see the problem.
I my understanding those examples point to the same memory address so what will be the problem ? Is there a MPI API related issue?
Thanks in advance
All the matrix references in your question point to the same address, albeit with different types. MPI_Send takes a void*, so it looses the type anyway. So all calls are practically equivalent.
If the type of matrix is changed from a 2D array to a pointer, then the above are no longer equivalent. E.g., if matrix is an int*, only the first and last option both compile and produce the required address. Therefore, I too would recommend using the first option - it's safer should the type of matrix changes.
With MPI_Send() you only specify the pointer of the data the matrix. It is recommended to store the matrix as a 1-dimentional array :
int *data = (int*)calloc(cols*rows,sizeof(int));
MPI_Send(data,cols*rows, ...);
When you use MPI_Recv() remember that you sent an array and you may for sure want to work with a matrix.
I would like to implement some R package, written in C code.
The C code must:
take an array (of any type) as input.
produce array as output (of unpredictable size).
What is the best practice of implementing array passing?
At the moment C code is called with .C(). It accesses array directly from R, through pointer. Unfortunately same can't be done for output, as output dimensions need to be known in advance which is not true in my case.
Would it make sense to pass array from C to R through a file? For example, in ramfs, if using linux?
Update 1:
Here the exact same problem was discussed.
There, possible option of returning array with unknown dimensions was mentioned:
Splitting external function into two, at the point before calculating the array but after dimensions are known. First part would return dimensions, then empty array would be prepared, then second part would run and populate the array in R.
In my case full dimensions are known only once whole code is executed, so this method would mean running C code twice. Taking a guess on maximal array size isn't optional either.
Update 2: It seems only way to do this is to use .Call() instead, as power suggested. Here are few good examples: http://www.sfu.ca/~sblay/R-C-interface.ppt.
Thanks.
What is the best practice of implementing array passing?
Is the package already written in ANSI C? .C() would then be quick and easy.
If you are writing from scratch, I suggest .Call() and Rcpp. In this way, you can pass R objects to your C/C++ code.
Would it make sense to pass array through a file?
No
Read "Writing R Extensions".