Type hinting numpy arrays and batches - arrays

I'm trying to create a few array types for a scientific python project. So far, I have created generic types for 1D, 2D and ND numpy arrays:
from typing import Any, Generic, Protocol, Tuple, TypeVar
import numpy as np
from numpy.typing import _DType, _GenericAlias
Vector = _GenericAlias(np.ndarray, (Tuple[int], _DType))
Matrix = _GenericAlias(np.ndarray, (Tuple[int, int], _DType))
Tensor = _GenericAlias(np.ndarray, (Tuple[int, ...], _DType))
The first issue is that mypy says that Vector, Matrix and Tensor are not valid types (e.g. when I try myvar: Vector[int] = np.array([1, 2, 3]))
The second issue is that I'd like to create a generic type Batch that I'd like to use like so: Batch[Vector[complex]] should be like Matrix[complex], Batch[Matrix[float]] should be like Tensor[float] and Batch[Tensor[int] should be like Tensor[int]. I am not sure what I mean by "should be like" I guess I mean that mypy should not complain.
How to I get about this?

You should not be using protected members (names starting with an underscore) from the outside. They are typically marked this way to indicated implementation details that may change in the future, which is exactly what happened here between versions of numpy. For example in 1.24 your import line from numpy.typing fails at runtime because the members you try to import are no longer there.
There is no need to use internal alias constructors because numpy.ndarray is already generic in terms of the array shape and its dtype. You can construct your own type aliases fairly easily. You just need to ensure you parameterize the dtype correctly. Here is a working example:
from typing import Tuple, TypeVar
import numpy as np
T = TypeVar("T", bound=np.generic, covariant=True)
Vector = np.ndarray[Tuple[int], np.dtype[T]]
Matrix = np.ndarray[Tuple[int, int], np.dtype[T]]
Tensor = np.ndarray[Tuple[int, ...], np.dtype[T]]
Usage:
def f(v: Vector[np.complex64]) -> None:
print(v[0])
def g(m: Matrix[np.float_]) -> None:
print(m[0])
def h(t: Tensor[np.int32]) -> None:
print(t.reshape((1, 4)))
f(np.array([0j+1])) # prints (1+0j)
g(np.array([[3.14, 0.], [1., -1.]])) # prints [3.14 0. ]
h(np.array([[3.14, 0.], [1., -1.]])) # prints [[ 3.14 0. 1. -1. ]]
The issue currently is that shapes have almost no typing support, but work is underway to implement that using the new TypeVarTuple capabilities provided by PEP 646. Until then, there is little practical use in discriminating the types by shape.
The batch issue should be a separate question. Try and ask one question at a time.

Related

Does blockwise allow iteration over out-of-core arrays?

The blockwise docs mention that with concatenate=False:
In the case of a contraction the passed function should expect an iterable of blocks on any array that holds that index.
My question then is whether or not there is a fundamental limitation that would prohibit this "iterable of blocks" from loading the blocks one at a time rather than keeping them all in a list (i.e. in memory). Is this possible? It does not look like blockwise works this way now, but I am wondering if it could:
import dask.array as da
import operator
# Create an array and write to disk
x = da.random.random(size=(10, 6), chunks=(5, 3))
da.to_zarr(x, '/tmp/x.zarr', overwrite=True)
x = da.from_zarr('/tmp/x.zarr')
y = x.T
def fn(x, y):
print(type(x), type(x[0]))
x = np.concatenate(x, axis=1)
y = np.concatenate(y, axis=0)
return np.matmul(x, y)
da.blockwise(fn, 'ik', x, 'ij', y, 'jk', concatenate=False, dtype='float').compute(scheduler='single-threaded')
# <class 'list'> <class 'numpy.ndarray'>
Is it possible for these lists to be generators instead?
This was true very early on in Dask, but we switched to concrete lists eventually. Today a task does not start until all of its dependency tasks are available in memory.
Given the context of your question I'm guessing that you're running up against memory issues with tensordot style applications. The memory use of tensordot style applications depends heavily on chunk structure. I encourage you to look at this issue, and especially at the talk referenced in the first post: https://github.com/dask/dask/issues/2225

How to make a square root fit to data in Julia 1.0

I have a very simple question. How would one fit a square root model to a dataset in Julia. I'm currently using the GLM package, which works very well with linear data. I need to plot a phase velocity as a function of string tension, and it seems like #formula(v ~ sqrt(T)) does not work in
import GLM, DataFrames # No global namespace imports
df = DataFrames.DataFrame(
v = [1, 1.5, 1.75],
T = [1, 2, 3]
)
fit = GLM.glm(GLM.#formula(v ~ T^(1/2)), vs)
Is GLM at all viable here, or do I need to resort to another package such as LsqFit?
You can use sqrt in your model formula. Just do it e.g. like this:
GLM.lm(GLM.#formula(v ~ sqrt(T)), df)
if you want to fit a linear model use lm function and a second argument should be a data frame, which is df in your case.

Pycharm typehint for Numpy array of objects

I have multiple numpy arrays with different objects. How can I give 'type-hint' to the Pycharm so that I can get methods of that class while coding? For example,
import numpy as np
class Test:
def do_task():
pass
data = [Test(), Test(), Test()]
my_array = np.array(data, dtype=Test)
def test_hint(array:np.ndarray): # <-- I can give typehint as array but I also want to give hint about what kind of array it is.
for a in array:
a.... # Here I want Pycharm to give me list of methods from 'Test' class
I can always explicitly mention what kind of object it is like following,
for a in array:
temp = a # type: Test
temp... # Here Pycharm will give me all the suggestions.
Is there any way I can provide type-hint in the constructor so that I do not need to write additional line of code to declare type of data? I tried looking at python's type module for some clue, but couldn't find any.

TensorFlow - Cannot get the shape of matrix with the get_shape command

I can't seem to get the shape of the tensor when I do
get_shape().as_list()
Here is the code I have written:
matrix1 = tf.placeholder(tf.int32)
matrix2 = tf.placeholder(tf.int32)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
a = sess.run(matrix1, {matrix1: [[1,2,3],[4,5,6],[7,8,9]]})
b = sess.run(matrix2, {matrix2: [[10,11,12],[13,14,15], [16,17,18]]})
print(a.get_shape().as_list()) #ERROR
I get the following error:
AttributeError: 'numpy.ndarray' object has no attribute 'get_shape'
I want to know the shape of the matrix so that I can take in an arbitrary matrix and loop through its rows and columns.
Just summarizing the discussion in the comments with few notes
Both matrix1 and a are multidimensional arrays, but there is a difference:
matrix1 is an instance of tf.Tensor, which supports two ways to access the shape: matrix1.shape attribute and matrix1.get_shape() method.
The result of tf.Tensor evaluation, a, is a numpy ndarray, which has just a.shape attribute.
Historically, tf.Tensor had only get_shape() method, shape was added later to make it similar to numpy. And one more note: in tensorflow, tensor shape can be dynamic (like in your example), in which case neither get_shape() nor shape will return a number. In this case, one can use tf.shape function to access it in runtime (here's an example when it might be useful).

Convert numpy array to MemoryView object

I'm trying to convert a numpy array to a MemoryView object because I have to communicate between two programs. The one can only handle NumPy arrays and the other only MemoryView objects.
Converting from MemoryView to numpy array is easily done by:
import numpy as np
MyNumpyArray=np.array(MyMemoryView)
But how do you convert from numpy array to MemoryView?
I found here: https://docs.python.org/3/c-api/memoryview.html That there's a PyMemoryView_FromObject(PyObject *obj) function, but I don't know how to call it without an example.
Thanks!
memoryview is one of the built-in types and can simply be called as:
arr = np.random.rand(5,4)
view = memoryview(arr)
view
<memory at 0x12699c318>
Additionally to accepted answer providing another simple method to get memoryview out of Numpy array:
Try it online!
a = np.arange(1, 9)
view = a.data
print(type(view)) # <class 'memoryview'>
In other words .data attribute of array variable gives exactly memoryview.

Resources