CuPy sparse kernels - sparse-matrix

What is the API for writing a custom (Elementwise or Raw) kernel that works on cuSPARSE instances in CuPy? If I want to write a kernel that can take cupyx.scipy.sparse.csr_matrix instances as argument, what arguments does the underlying CUDA code need to take?

Found it (at least approximately). For a CSR sparse matrix s, the fields s.data, s.indices, and s.indptr contain a dense array of length s.nnz of nonzero entries, a dense array of length s.nnz of column indices, and a list of locations (relative addresses) of the beginnings of each row in indices respectively. These are all normal cupy.ndarrays and can be passed into a kernel and then to cuSPARSE functions as normal.

Related

Passing subarray of 2-dimensional array in FORTRAN-77

I have 2-dimensional array
real triangle(0:2, 0:1)
where "triangle" is an array of vectors (1-dim arrays)
also i have subroutine
subroutine vecSub(lhs, rhs, result)
real lhs(0:1), rhs(0:1), result(0:1)
result(0) = lhs(0) - rhs(0)
result(1) = lhs(1) - rhs(1)
return
end
is there any way to pass one of the vectors from "triangle" variable to this subroutine? Fortran-90 can do this: triangle(0, :) which gives first array of triangle, but i'm allowed to use only FORTRAN-77, so this won't do, any suggestions?
#Javier Martin wrote "not with the current layout of your array", but missed the opportunity to suggest an alternative.
If instead you declared the variable as follows:
real triangle(0:1, 0:2)
reversing the order of the bounds, you could then pass triangle(0,0), triangle(0,1) or triangle(0,2) to the subroutine and get exactly the behavior you want, due to a Fortran feature called "sequence association". When you pass a single array element to a dummy argument that is an array, you are implicitly passing that and following elements, in array element order. This is about the only allowed violation of the normal Fortran shape-matching rules, and was part of FORTRAN 77.
No, not with the current layout of your array, because of two reasons:
Fortran uses an array element order in which the leftmost dimension is contiguous. That is, in an array of size (n,m,l) the distance between elements (the stride) is (1,n,m), measured in units of array elements (that is, not bytes).
F77 does not include assumed-shape arrays a(:) which are generally implemented by passing a small descriptor structure that communicates details like the stride or the number of elements. Instead, you can only use assumed-length arrays a(*) which are normally a pointer to the first element, kind of like C arrays. You have to pass the length as a separate argument, and array elements have to be contiguous
This is the reason why you can "pass a subarray" to an F77 subroutine, as long as that subarray is e.g. a matrix column: elements therein are contiguous.
A possible solution (one that many current Fortran compilers implement) is that when you try to pass a non-contiguous subarray to a function that is not known to accept them, they make a copy of the array, and even write it back in memory if required. This would be equivalent to:
! Actual array
integer m(3,5)
integer dummy(5)
dummy = m(2,:)
call myF77sub(dummy, 5)
m(2,:) = dummy
However, as others are saying, you should try not to call F77 functions directly, but either adapt them to or at least wrap them in more recent Fortran interfaces. Then you can have code like the above in the wrapper, and call that wrapper "normally" from modern Fortran routines. Then you may eventually get around to rewriting the actual implementation in modern Fortran without affecting client code.

Sorting multiple vectors with "slices" distributed across many files

I'm dealing with a big data problem: I've got some large number of arrays (~1M) that are distributed across a large number of files (~1k). The data is organized so that the ith file contains the ith entry of all arrays. If the overall cost of my algorithm is determined by the number of files that I need to open (and assuming only one file can be opened at a time), is there a strategy to simultaneously sort all of the arrays in-place so as to minimize the overall cost?
Note that the data is far too large for everything to be stored in memory, but there should be no problem storing ~10 entries from all arrays in memory (i.e. 10x1M values).
This question has lack of information. There is no mention if the arrays are already sorted itself or not. I am going to answer assuming the arrays are not sorted itself.
The data is organized so that the ith file contains the ith entry of
all arrays.
From this, I can assume this -
file i
------------
arr1[i]
arr2[i]
arr3[i]
...
...
arrN[i] # N = ~1M
You mentioned the number of arrays are 1M and number of files 1K, so according this no array will contain more than 1K elements otherwise more files would be required.
Each file contains 1M elements.
....but there should be no problem storing ~10 entries from all arrays
in memory (i.e. 10x1M values).
So, we should be able to load all elements of a file in memory as it won't be more than 1M elements.
So load each file in memory and sort the elements of the file.
Then apply K-Way Merge Algorithms using minheap to sort the 1K files holding sorted elements. This step will take c * 1M elements to load in memory when c is small constant (c < 3).
Let me know if you have any trouble to understand K-way merging.
Hope it helps!

Igraph_vector_init_finally and igraph_vector_cumsum

So I'm looking at the source code of igraph library for c, because I need to create a new type of graph which is not included in that library but it's somehow related to a fitness model graph for free-scale networks. While reading the code relative to the build up of such a graph, I've found that these functions are called in many occasions:
(void) IGRAPH_VECTOR_INIT_FINALLY(igraph_vector*,long_int);
(void) igraph_vector_cumsum(igraph_vector*,igraph_vector*);
I can't seem to locate it through the folder and I've searched online but it seems that I can't just find what it does. For example, in a portion of the code i have:
/* Calculate the cumulative fitness scores */
IGRAPH_VECTOR_INIT_FINALLY(&cum_fitness_out, no_of_nodes);
IGRAPH_CHECK(igraph_vector_cumsum(&cum_fitness_out, fitness_out));
max_out = igraph_vector_tail(&cum_fitness_out);
p_cum_fitness_out = &cum_fitness_out;
where cum_fitness_out it's an empty vector, no_of nodes is the number of nodes, igraph check it's a function to check the return of the function igraph_cumsum, vector tail returns the last element of a vector...
IGRAPH_VECTOR_INIT_FINALLY(vector, size) is a shorthand for:
IGRAPH_CHECK(igraph_vector_init(vector, size));
IGRAPH_FINALLY(igraph_vector_destroy, vector);
Basically, it initializes a vector with the given number of undefined elements, checks whether the memory allocation was successful, and then puts the vector on top of the so-called finally stack that contains the list of pointers that should be destroyed in case of an error in the code that follows. More information about IGRAPH_FINALLY can be found here.
igraph_vector_cumsum() calculates the cumulative sum of a vector; its source can be found in src/vector.pmt. .pmt stands for "poor man's templates"; it is essentially a C source file with a bunch of macros that allow the library to quickly "generate" the same data type (e.g., vectors) for different base types (integers, doubles, Booleans etc) with some macro trickery.

LabVIEW variable array size in SubVIs on FPGA

I have acquisition code running on an cRIO FPGA target. The data is acquired from the I/O nodes and composed to an array. This array should always be of the same size thus I check that with a SubVI. The problem is that I use conditional disable structures to replace the acquistion code for different targets with different channel numbers. Now the compiler complains that it can't resolve the array to a fixed size which is not true because it could be counted by the compiler very easy.
How do I have to write my SubVI that it accepts a (at compile time) variable array? The "array size" symbol from the array palett can do this too. How?
You can use Lookup tables instead to achieve your goal. Or if you have to send this array to RT vi it would be more professional to use DMA FIFO instead. At RT side you can use polling method and read as many points you like at a time.
In short this is not possible with standard LabVIEW arrays as the size must be fixed for compilation (as these basically come down to wires in the chip).
There are two options when you actually need a variable size:
Simple and Wasteful - If there is a reasonable upper bound you can set it to the highest and use logic to control the "end". This means compiling resources for the upper end and if it is more than 100's of bytes will use up a lot of logic.
Scalable but slightly harder - The only way to achieve a large variable size array is to use some of the memory options available with some added logic for defining the size. Depending on the size you can either use look up tables (LUTs) or block RAM. Again LUTs use up logic quickly so should only be used for small arrays (Can't remember the exact size recommended but probably < 500 bytes). If you've not used it you can find some initial reading at http://zone.ni.com/reference/en-XX/help/371599H-01/lvfpgaconcepts/fpga_storing_data/#Memory_Items
Either way you will have to somehow pass the subVI the size of the array so it knows how far into the memory to ready, this would have to simply be another input.
More commonly in LabVIEW FPGA most processing is done on point-by-point data so you can centralise the storage logic without having to pass this around, however this depends on the nature of the algorithm.

Does an empty array take up space?

After looking online, it seems like this depends on the language being used?
However, I couldn't find any place where there was some sort of list with several different programming languages and whether this was true of not for each one.
Could someone confirm this and perhaps list a couple languages where initializing an empty array takes up the same amount of space as if it was filled? and a couple where it doesnt?
So, for example, if I were somehow restricted to only be able to load 1000 integers into memory, could I initialize a 100000 integer empty array (assuming the empty array wouldn't take up the same amount of space as if it were filled)?
As you mention...hard (impossible) to answer without a targeted language.
However, languages that use linked-lists to join array elements together are more likely to not pre-allocate memory for undefined elements.
If you need to reduce your memory footprint to minimum (as in your 100000 element example) you could use an add-in library that implements linked lists outside the language of your choice.

Resources