Does blockwise allow iteration over out-of-core arrays? - arrays

The blockwise docs mention that with concatenate=False:
In the case of a contraction the passed function should expect an iterable of blocks on any array that holds that index.
My question then is whether or not there is a fundamental limitation that would prohibit this "iterable of blocks" from loading the blocks one at a time rather than keeping them all in a list (i.e. in memory). Is this possible? It does not look like blockwise works this way now, but I am wondering if it could:
import dask.array as da
import operator
# Create an array and write to disk
x = da.random.random(size=(10, 6), chunks=(5, 3))
da.to_zarr(x, '/tmp/x.zarr', overwrite=True)
x = da.from_zarr('/tmp/x.zarr')
y = x.T
def fn(x, y):
print(type(x), type(x[0]))
x = np.concatenate(x, axis=1)
y = np.concatenate(y, axis=0)
return np.matmul(x, y)
da.blockwise(fn, 'ik', x, 'ij', y, 'jk', concatenate=False, dtype='float').compute(scheduler='single-threaded')
# <class 'list'> <class 'numpy.ndarray'>
Is it possible for these lists to be generators instead?

This was true very early on in Dask, but we switched to concrete lists eventually. Today a task does not start until all of its dependency tasks are available in memory.
Given the context of your question I'm guessing that you're running up against memory issues with tensordot style applications. The memory use of tensordot style applications depends heavily on chunk structure. I encourage you to look at this issue, and especially at the talk referenced in the first post: https://github.com/dask/dask/issues/2225

Related

Query on TFP Probabilistic Model

In the TFP tutorial, the model output is Normal distribution. I noted that the output can be replaced by an IndependentNormal layer. In my model, the y_true is binary class. Therefore, I used an IndependentBernoulli layer instead of IndependentNormal layer.
After building the model, I found that it has two output parameters. It doesn't make sense to me since Bernoulli distribution has one parameter only. Do you know what went wrong?
# Define the prior weight distribution as Normal of mean=0 and stddev=1.
# Note that, in this example, the we prior distribution is not trainable,
# as we fix its parameters.
def prior(kernel_size, bias_size, dtype=None):
n = kernel_size + bias_size
prior_model = Sequential([
tfpl.DistributionLambda(
lambda t: tfd.MultivariateNormalDiag(loc=tf.zeros(n), scale_diag=tf.ones(n))
)
])
return prior_model
# Define variational posterior weight distribution as multivariate Gaussian.
# Note that the learnable parameters for this distribution are the means,
# variances, and covariances.
def posterior(kernel_size, bias_size, dtype=None):
n = kernel_size + bias_size
posterior_model = Sequential([
tfpl.VariableLayer(tfpl.MultivariateNormalTriL.params_size(n), dtype=dtype),
tfpl.MultivariateNormalTriL(n)
])
return posterior_model
# Create a probabilistic DL model
model = Sequential([
tfpl.DenseVariational(units=16,
input_shape=(6,),
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/X_train.shape[0],
activation='relu'),
tfpl.DenseVariational(units=16,
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/X_train.shape[0],
activation='sigmoid'),
tfpl.DenseVariational(units=tfpl.IndependentBernoulli.params_size(1),
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/X_train.shape[0]),
tfpl.IndependentBernoulli(1, convert_to_tensor_fn=tfd.Bernoulli.logits)
])
model.summary()
screenshot of the results executed the codes on Google Colab
I agree the summary display is confusing but I think this is an artifact of the way tfp layers are implemented to interact with keras. During normal operation, there will only be one return value from a DistributionLambda layer. But in some contexts (that I don't fully grok) DistributionLambda.call may return both a distribution and a side-result. I think the summary plumbing triggers this for some reason, so it looks like there are 2 outputs, but there will practically only be one. Try calling your model object on X_train, and you'll see you get a single distribution out (its type is actually something called TensorCoercible, which is a wrapper around a distribution that lets you pass it into tf ops that call tf.convert_to_tensor -- the resulting value for that op will be the result of calling your convert_to_tensor_fn on the enclosed distribution).
In summary, your distribution layer is fine but the summary is confusing. It could probably be fixed; I'm not keras-knowledgeable enough to opine on how hard it would be.
Side note: you can omit the event_shape=1 parameter -- the default value is (), or "scalar", which will behave the same.
HTH!

Passing shared memory variables in python multiprocessing

I have a bunch of files that I want to read in parallel using Python's multiprocessing and collect all the data in a single NumPy array. For this purpose, I want to define a shared memory NumPy array and pass its slices to different processes to read in parallel. A toy illustration of what I am trying to do is given in the following code where I am trying to modify a numpy array using multiprocessing.
Example 1:
import numpy as np
import multiprocessing
def do_stuff(i, arr):
arr[:]=i
return
def print_error(err):
print(err)
if __name__ == '__main__':
idx = [0,1,2,3]
# Need to fill this array in parallel
arr = np.zeros(4)
p = multiprocessing.Pool(4)
# Passing slices to arr to modify using multiprocessing
for i in idx:
p.apply(do_stuff, args=(i,arr[i:i+1]))
p.close()
p.join()
print(arr)
In this code, I want the arr to be filled with 0, 1, 2, 3. This however prints arr to be all zeros. After reading the answers here, I used multiprocessing.Array to define the shared memory variable and modified my code as follows
Example 2:
import numpy as np
import multiprocessing
def do_stuff(i, arr):
arr[:]=i
return
def print_error(err):
print(err)
if __name__ == '__main__':
idx = [0,1,2,3]
p = multiprocessing.Pool(4)
# Shared memory Array
shared = multiprocessing.Array('d', 4)
arr = np.ctypeslib.as_array(shared.get_obj())
for i in idx:
p.apply(do_stuff, args=(i,arr[i:i+1]))
p.close()
p.join()
print(arr)
This also prints all zeros for arr. However, when I define the array outside main and use pool.map, the code works. For e.g., the following code works
Example 3:
import numpy as np
import multiprocessing
shared = multiprocessing.Array('d', 4)
arr = np.ctypeslib.as_array(shared.get_obj())
def do_stuff(i):
arr[i]=i
return
def print_error(err):
print(err)
if __name__ == '__main__':
idx = [0,1,2,3]
p = multiprocessing.Pool(4)
shared = multiprocessing.Array('d', 4)
p.map(do_stuff, idx)
p.close()
p.join()
print(arr)
This prints [0,1,2,3].
I am very confused by all this. My questions are:
When I define arr = np.zeros(4), which processor owns this variable? When I then send the slice of this array to different processors what is being sent if this variable is not defined on those processors.
Why doesn't example 2 work while example 3 does?
I am working on Linux and Python/3.7/4
When I define arr = np.zeros(4), which processor owns this variable?
Only the main process should have access to this. If you use "fork" for the start method, everything will be accessible to the child process, but as soon as something tries to modify it, it will be copied to it's own private memory space before being modified (copy on write). This reduces overhead if you have large read-only arrays, but doesn't help you much for writing data back to those arrays.
what is being sent if this variable is not defined on those processors.
A new array is created within the child process when the arguments are re-constructed after being sent from the main process via a pipe and pickle. The data is serialized to text and re-constructed, so no information other than the value of the data in the slice remains. it's a totally new object.
Why doesn't example 2 work while example 3 does?
example 3 works because at the time of "fork" (the moment you call Pool), arr has already been created, and will be shared. It's also important that you used an Array to create it, so when you attempt to modify the data, the data is shared (the exact mechanics of this are complicated).
example 2 does not work in a similar way that example 1 does not work: you pass a slice of an array as an argument, which gets converted into a totally new object, so arr inside your do_stuff function is just a copy of arr[i:i+1] from the main process. It is still important to create anything which will be shared between processes before calling Pool (if you're relying on "fork" to share the data), but that's not why this example doesn't work.
You should know: example 3 only works because you're on linux, and the default start method is fork. This is not the preferred start method due to the possibility of deadlocks with copying lock objects in a locked state. This will not work on Windows at all, and won't work on MacOS by default on 3.8 and above.
The best solution (most portable) to all this is to pass the Array itself as the argument, and re-construct the numpy array inside the child process. This has the complication that "shared objects" can only be passed as arguments at the creation of the child process. This isn't as big a deal if you use Process, but with Pool, you basically have to pass any shared objects as arguments to an initialization function, and get the re-constructed array as a global variable of the child's scope. In this example for instance you will get an error trying to pass buf as an argument with p.map or p.apply, but not when passing buf as initargs=(buf,) to Pool()
import numpy as np
from multiprocessing import Pool, Array
def init_child(buf):
global arr #use global context (for each process) to pass arr to do_stuff
arr = np.frombuffer(buf.get_obj(), dtype='d')
def do_stuff(i):
global arr
arr[i]=i
if __name__ == '__main__':
idx = [0,1,2,3]
buf = Array('d', 4)
arr = np.frombuffer(buf.get_obj(), dtype='d')
arr[:] = 0
#"with" context is easier than writing "close" and "join" all the time
with Pool(4, initializer=init_child, initargs=(buf,)) as p:
for i in idx:
p.apply(do_stuff, args=(i,)) #you could pass more args to get slice indices too
print(arr)
with 3.8 and above there's a new module which is better than Array or any of the other sharedctypes classes called: shared_memory. This is a bit more complicated to use, and has some additional OS dependent nastiness, but it's theoretically lower overhead and faster. If you want to go down the rabbit hole I've written a few answers on the topic of shared_memory, and have recently been answering lots of questions on concurrency in general if you want to take a gander at my answers from the last month or two.

How do I implement a controlled Rx in Cirq/Tensorflow Quantum?

I am trying to implement a controlled rotation gate in Cirq/Tensorflow Quantum.
The readthedocs.io at https://cirq.readthedocs.io/en/stable/gates.html states:
"Gates can be converted to a controlled version by using Gate.controlled(). In general, this returns an instance of a ControlledGate. However, for certain special cases where the controlled version of the gate is also a known gate, this returns the instance of that gate. For instance, cirq.X.controlled() returns a cirq.CNOT gate. Operations have similar functionality Operation.controlled_by(), such as cirq.X(q0).controlled_by(q1)."
I have implemented
cirq.rx(theta_0).on(q[0]).controlled_by(q[3])
I get the following error:
~/.local/lib/python3.6/site-packages/cirq/google/serializable_gate_set.py in
serialize_op(self, op, msg, arg_function_language)
193 return proto_msg
194 raise ValueError('Cannot serialize op {!r} of type {}'.format(
--> 195 gate_op, gate_type))
196
197 def deserialize_dict(self,
ValueError: Cannot serialize op cirq.ControlledOperation(controls=(cirq.GridQubit(0, 3),), sub_operation=cirq.rx(sympy.Symbol('theta_0')).on(cirq.GridQubit(0, 0)), control_values=((1,),)) of type <class 'cirq.ops.controlled_gate.ControlledGate'>
I have the qubits and symbols initialized as:
q = cirq.GridQubit.rect(1, 4)
symbol_names = x_0, x_1, x_2, x_3, theta_0, theta_1, z_2, z_3
I do re-use the circuits with various circuits.
My question: How do I properly implement a controlled Rx in Cirq/Tensorflow Quantum?
P.S. I can't find a tag for Google Cirq
Follow up:
How does this generalize to the similar situations of Controlled Ry and controlled Rz?
For Rz I found a gate decomposition at https://threeplusone.com/pubs/on_gates.pdf, involving H.on(q1), CNOT(q0, q1), H.on(q2), but this is not yet an CRz with an arbitrary angle. Would I introduce the angle before the H?
For the Ry, I did not find a decomposition yet, neither the CRy.
What you have is a completely correct implementation of a controlled X rotation in Cirq. It can be used in simulation and other things like cirq.unitary without any issues.
TFQ only supports a subset of gates in Cirq. For example a cirq.ControlledGate can have an arbitrary number of control qubits, which in some cases can make it harder to decompose down to primitive gates that are compatible with NiSQ hardware platforms (This is why cirq.decompose doesn't do anything to ControlledOperations). TFQ only supports these primitive style gates , for a full list of the supported gates, you can do:
tfq.util.get_supported_gates().keys()
In your case it is possible to come up with a simpler implementation of this gate. First we can note that cirq.rx(some angle) is equal to cirq.X**(some angle / pi) offset by a global phase:
>>> a = cirq.rx(0.3)
>>> b = cirq.X**(0.3 / np.pi)
>>> cirq.equal_up_to_global_phase(cirq.unitary(a), cirq.unitary(b))
True
Lets move to using X now. Then the operation we are after is:
>>> qs = cirq.GridQubit.rect(1,2)
>>> a = (cirq.X**0.3)(qs[0]).controlled_by(qs[1])
>>> b = cirq.CNOT(qs[0], qs[1]) ** 0.3
>>> cirq.equal_up_to_global_phase(cirq.unitary(a), cirq.unitary(b))
True
Since cirq.CNOT is in the TFQ supported gates it should be serializable without any issues. If you want to make a symbolized version of the gate you can just replace the 0.3 with a sympy.Symbol.
Answer to follow up: If you want to do a CRz you can do the same thing you did above, swapping out the CNOT gate for the CZ gate. For CRy it's not as easy. For that I would recommend doing some combination of: cirq.Y(0) and cirq.YY(0, 1).
Edit: tfq-nightly builds and likely releases after 0.4.0 now include support for arbitrary controlled gates. So on these versions of tfq you could also do things like cirq.Y(...).controlled_by(...) to achieve the desired result now too.

Size of Scala array by byte [duplicate]

I know how to find the file size in scala.But how to find a RDD/dataframe size in spark?
Scala:
object Main extends App {
val file = new java.io.File("hdfs://localhost:9000/samplefile.txt").toString()
println(file.length)
}
Spark:
val distFile = sc.textFile(file)
println(distFile.length)
but if i process it not getting file size. How to find the RDD size?
If you are simply looking to count the number of rows in the rdd, do:
val distFile = sc.textFile(file)
println(distFile.count)
If you are interested in the bytes, you can use the SizeEstimator:
import org.apache.spark.util.SizeEstimator
println(SizeEstimator.estimate(distFile))
https://spark.apache.org/docs/latest/api/java/org/apache/spark/util/SizeEstimator.html
Yes Finally I got the solution.
Include these libraries.
import org.apache.spark.sql.Row
import org.apache.spark.rdd.RDD
import org.apache.spark.rdd
How to find the RDD Size:
def calcRDDSize(rdd: RDD[String]): Long = {
rdd.map(_.getBytes("UTF-8").length.toLong)
.reduce(_+_) //add the sizes together
}
Function to find DataFrame size:
(This function just convert DataFrame to RDD internally)
val dataFrame = sc.textFile(args(1)).toDF() // you can replace args(1) with any path
val rddOfDataframe = dataFrame.rdd.map(_.toString())
val size = calcRDDSize(rddOfDataframe)
Below is one way apart from SizeEstimator.I use frequently
To know from code about an RDD if it is cached, and more precisely, how many of its partitions are cached in memory and how many are cached on disk? to get the storage level, also want to know the current actual caching status.to Know memory consumption.
Spark Context has developer api method getRDDStorageInfo()
Occasionally you can use this.
Return information about what RDDs are cached, if they are in mem or
on disk, how much space they take, etc.
For Example :
scala> sc.getRDDStorageInfo
res3: Array[org.apache.spark.storage.RDDInfo] =
Array(RDD "HiveTableScan [name#0], (MetastoreRelation sparkdb,
firsttable, None), None " (3) StorageLevel: StorageLevel(false, true, false, true, 1); CachedPartitions: 1;
TotalPartitions: 1;
MemorySize: 256.0 B; ExternalBlockStoreSize: 0.0 B; DiskSize: 0.0 B)
Seems like spark ui also used the same from this code
See this Source issue SPARK-17019 which describes...
Description
With SPARK-13992, Spark supports persisting data into
off-heap memory, but the usage of off-heap is not exposed currently,
it is not so convenient for user to monitor and profile, so here
propose to expose off-heap memory as well as on-heap memory usage in
various places:
Spark UI's executor page will display both on-heap and off-heap memory usage.
REST request returns both on-heap and off-heap memory.
Also these two memory usage can be obtained programmatically from SparkListener.

TensorFlow - Cannot get the shape of matrix with the get_shape command

I can't seem to get the shape of the tensor when I do
get_shape().as_list()
Here is the code I have written:
matrix1 = tf.placeholder(tf.int32)
matrix2 = tf.placeholder(tf.int32)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
a = sess.run(matrix1, {matrix1: [[1,2,3],[4,5,6],[7,8,9]]})
b = sess.run(matrix2, {matrix2: [[10,11,12],[13,14,15], [16,17,18]]})
print(a.get_shape().as_list()) #ERROR
I get the following error:
AttributeError: 'numpy.ndarray' object has no attribute 'get_shape'
I want to know the shape of the matrix so that I can take in an arbitrary matrix and loop through its rows and columns.
Just summarizing the discussion in the comments with few notes
Both matrix1 and a are multidimensional arrays, but there is a difference:
matrix1 is an instance of tf.Tensor, which supports two ways to access the shape: matrix1.shape attribute and matrix1.get_shape() method.
The result of tf.Tensor evaluation, a, is a numpy ndarray, which has just a.shape attribute.
Historically, tf.Tensor had only get_shape() method, shape was added later to make it similar to numpy. And one more note: in tensorflow, tensor shape can be dynamic (like in your example), in which case neither get_shape() nor shape will return a number. In this case, one can use tf.shape function to access it in runtime (here's an example when it might be useful).

Resources