how to define a list in Cython - arrays

I want to transform the below python code in Cython:
x_array = []
x_array.append(x_new)
I tried the following Cython codes but it gives error:
cdef np.ndarray[double, dim=1] x_array
x_array.append(x_new)
The error shows:
Cannot coerce list to type [double, dim=1]

Your options are:
cdef list x_array. This lets Cython know that the type of x_array is actually a list. You may get a small speed-up from this.
Make x_array a numpy array instead. If all the elements in the list are the same simple, numeric type then this is probably a better option. Be aware that appending to numpy arrays is likely to be pretty slow, so you should calculate the size in advance.
cdef np.array[double, dim=1] x_array = np.zeros((some_precomputed_size,))
# or
cdef double[:] x_array = np.zeros((some_precomputed_size,))
Note that this will only give you a speed-up for some types of operations (mostly accessing individual elements in Cython)
If you're set on using Python lists you can sometimes get a speed-up by accessing them through the Python C API in Cython. This answer provides an example of where that worked well. This works best when you know a size in advance and so you can pre-allocate the array (i.e. don't append!) and also avoid some Cython reference counting. It's very easy to go wrong and make reference counting errors with this method, so proceed carefully.

Related

How to concatenate two arrays in Fortran 90

I have an original array called pres_lev3d, whose size is defined by pres_lev3d(im*jm, levsi), where im*jm is 72960 and levsi is 64. This corresponds to global atmospheric data, thus the size. The array is allocatable: real (kind=kind_io8), allocatable :: pres_lev3d(:, :). I have a second array, press_1d, whose size is also defined in a similar fashion pres_1d(im*jm, levsi), but in this array levsi is 1.
I need to concatenate both arrays (technically a 2d and 1d array) to an array of the shape (/72960, 65/). In MATLAB this seems like a very simple process, however, I can't seem to find an easy way to go around it in Fortran 90.
I have tried to create a third array
pres_lev=(/pres_lev3d, pres_1d/)
and also tried to use merge, but none of these approaches seem to work out.
I am fairly new to Fortran.
If I've followed your explanation correctly this would probably work
real(kind_io8), dimension(72960,65) :: out_array
...
out_array(:,1:64) = pres_lev3d
out_array(:,65) = pres_1d
If that's not easy enough, or if I've misunderstood your question, explain further. To allocate out_array to conform to your input arrays, try something like
real(kind_io8), dimension(:,:), allocatable :: out_array
...
allocate(out_array(size(pres_lev3d,1),size(pres_lev3d,2)+1))
...
out_array(:,1:64) = pres_lev3d
out_array(:,65) = pres_1d

Need to port 2D Arrays to Bash 3.2.39 from 4.1.9

I have written a script that uses many two-dimensional arrays declared by using:
declare -A array_name
I've since added a lot of code (about 800 lines) that uses these arrays. The issue is that I have written and tested this in Bash 4.2.9(1), and now I need to have this code compatible with Bash 3.2.39(1) which clearly doesn't support that declaration (I get errors up the wazoo).
Here are the errors I'm getting if you're interested:
./script.sh: line 1072: declare: -A: invalid option
declare: usage: declare [-afFirtx] [-p] [name[=value] ...]
For example I would set the arrays in this fashion:
array_name[1,4]="pls help"
And then output or access them in this fashion:
printf "%s" "${array_name[1,4]}"
Of course with a much bigger array with sizes that range from 1x20 to 70x15 with more useful information stored in them.
Is there any way to do this without changing much of Bash 4.X's functionality to access and store things in arrays?
Bash does not really have 2D arrays. What you're using are 'associative arrays' where the subscript is a string (but your strings happen to contain commas and look a bit like two subscripts). Associative arrays are wonderful but not easily simulated. It is sufficiently hard that I would not try.
You could (sort of, extremely clunkily) do it with two arrays: an array to hold the '2D' subscripts as values each with an index number, and an ordinary integer indexed array that holds the real values. You'd need a function to add a new entry and another to lookup an entry. And you'd then have horrendous subscript expressions:
${array1[$(lookup "$subscript1")]}
or thereabouts. But that is horrid and requires an O(N) linear search to find the subscripts, etc.
You'd probably find it a lot easier to code in a language that has native support for associative arrays (Awk), hashes (Perl) or dictionaries (Python), etc. And you'd be able to code to the same other language on both systems.
Or simply get Bash 4.x installed on the other machines — that might be simpler still. It needn't (probably shouldn't) be installed over the standard Bash 3.x; you could install /bin/bash4 and use that in the shebang line #!/bin/bash4 — or install it in /usr/local/bin or whereever else you use to install software.
Wittgenstein said "The limits of my language mean the limits of my world." In this context, the absence of associative arrays in a language (Bash 3.x) are a distinct limitation on the world of programs that you can write.

R external interface

I would like to implement some R package, written in C code.
The C code must:
take an array (of any type) as input.
produce array as output (of unpredictable size).
What is the best practice of implementing array passing?
At the moment C code is called with .C(). It accesses array directly from R, through pointer. Unfortunately same can't be done for output, as output dimensions need to be known in advance which is not true in my case.
Would it make sense to pass array from C to R through a file? For example, in ramfs, if using linux?
Update 1:
Here the exact same problem was discussed.
There, possible option of returning array with unknown dimensions was mentioned:
Splitting external function into two, at the point before calculating the array but after dimensions are known. First part would return dimensions, then empty array would be prepared, then second part would run and populate the array in R.
In my case full dimensions are known only once whole code is executed, so this method would mean running C code twice. Taking a guess on maximal array size isn't optional either.
Update 2: It seems only way to do this is to use .Call() instead, as power suggested. Here are few good examples: http://www.sfu.ca/~sblay/R-C-interface.ppt.
Thanks.
What is the best practice of implementing array passing?
Is the package already written in ANSI C? .C() would then be quick and easy.
If you are writing from scratch, I suggest .Call() and Rcpp. In this way, you can pass R objects to your C/C++ code.
Would it make sense to pass array through a file?
No
Read "Writing R Extensions".

Allocating arrays of the same size

I'd like to allocate an array B to be of the same shape and have the same lower and upper bounds as another array A. For example, I could use
allocate(B(lbound(A,1):ubound(A,1), lbound(A,2):ubound(A,2), lbound(A,3):ubound(A,3)))
But not only is this inelegant, it also gets very annoying for arrays of (even) higher dimensions.
I was hoping for something more like
allocate(B(shape(A)))
which doesn't work, and even if this did work, each dimension would start at 1, which is not what I want.
Does anyone know how I can easily allocate an array to have the same size and bounds as another array easily for arbitrary array dimensions?
As of Fortran 2008, there is now the MOLD optional argument:
ALLOCATE(B, MOLD=A)
The MOLD= specifier works almost in the same way as SOURCE=. If you specify MOLD= and source_expr is a variable, its value need not be defined. In addition, MOLD= does not copy the value of source_expr to the variable to be allocated.
Source: IBM Fortran Ref
You can either define it in a preprocessor directive, but that will be with a fixed dimensionality:
#define DIMS3D(my_array) lbound(my_array,1):ubound(my_array,1),lbound(my_array,2):ubound(my_array,2),lbound(my_array,3):ubound(my_array,3)
allocate(B(DIMS3D(A)))
don't forget to compile with e.g. the -cpp option (gfortran)
If using Fortran 2003 or above, you can use the source argument:
allocate(B, source=A)
but this will also copy the elements of A to B.
If you are doing this a lot and think it too ugly, you could write your own subroutine to take care of it, copy_dims (template, new_array), encapsulating the source code line you show. You could even set up a generic interface so that it could handle arrays of several ranks -- see how to write wrapper for 'allocate' for an example of that concept.

Passing Numpy arrays to C code wrapped with Cython

I have a small bit of existing C code that I want to wrap using Cython. I want to be able to set up a number of numpy arrays, and then pass those arrays as arguments to the C code whose functions take standard c arrays (1d and 2d). I'm a little stuck in terms of figuring out how to write the proper .pyx code to properly handle things.
There are a handful of functions, but a typical function in the file funcs.h looks something like:
double InnerProduct(double *A, double **coords1, double **coords2, const int len)
I then have a .pyx file that has a corresponding line:
cdef extern from "funcs.h":
double InnerProduct(double *A, double **coords1, double **coords2, int len)
where I got rid of the const because cython doesn't support it. Where I'm stuck is what the wrapper code should then look like to pass a MxN numpy array to the **coords1 and **coords2 arguments.
I've struggled to find the correct documentation or tutorials for this type of problem. Any suggestions would be most appreciated.
You probably want Cython's "typed memoryviews" feature, which you can read about in full gory detail here. This is basically the newer, more unified way to work with numpy or other arrays. These can be exposed in Python-land as numpy arrays, or you can export them to Python (for example, here). You have to pay attention to how the striding works and make sure you're consistent about e.g. C-contiguous vs. FORTRAN-like arrays, but the docs are pretty clear on how to do that.
Without knowing a bit more about your function it's hard to be more concrete on exactly the best way to do this - i.e., is the C function read-only for the arrays? (I think yes based on the signature you gave, but am not 100% sure.) If so you don't need to worry about making copies if needed to get C-contiguous states, because the C function doesn't need to talk back to the Python-level numpy array. But typed memoryviews will let you do any of this with a minimum of fuss.
The cython interface code should be created according to the tutorial given here.
To get a C pointer to the data in a numpy array, you should use the ctypes attribute of the numpy array, which is described here.

Resources