Problem when trying to generate a psuedo-random array/matrix from normally distributed numbers

Problem when trying to generate a psuedo-random array/matrix from normally distributed numbers - arrays

I am trying to time the generation of psuedo-random arrays in ipython, using random.gauss() and list comprehension in a ubuntu terminal but it kills the environment after pausing for a while the environment is killed and returns to root. I'm doing this to time the difference between a pure Python approach vs using Numpy.
tried on ubuntu VM and Windows.
import random
I = 5000
mat = [[random.gauss(0, 1) for j in range(I)] for i in range(I)]
expected a array with a shape of 5000x5000 instead get killed.

Very large overhead to use standard python for such kind of things (after generation you have to operate on it, right?)
Please, use NumPy
import numpy as np
q = np.random.normal(size=(5000,5000))
print(q.shape)
that was pretty much instant

Related

Fill a bytearray of known size from generator

Outgoing from a formerly given or computed very long array (MB, GB, TB) of small bytesize numbers (so I use a bytearray), I need to compute in an next iteration step a follow-up bytearray. It is possible to compute the size which is needed for next iteration step bytearray, so I can allocate the memory beforehand by using one of the constructors for bytearray:
# A is the current/former bytearray
# sizes of array: 1 -> 2 -> 8 -> 48 -> 480 -> 5_760 -> 92_160 -> 1_658_880 ->
# 36_495_360 -> 1_021_870_080 -> 30_656_102_400 [ -> 1_103_619_686_400 ... ]
ls = NextLenArray(A)
L = bytearray(ls)
# generator will create new values out of the current existing
for i,j in enumerate(gen_values(A)):
L[i] = j
# need to assign it back into A for next iteration
A = L
Alternatively, it's obvious to create directly the next bytearray by using generator inside comprehension. I don't know how and when (stepwise?) reservation of memory will happen.
A = bytearray(j for j in gen_values(A))
It looks like that it runs a little faster, but monitoring task manager it uses more memory while generating and in later iteration steps it get stopped one step earlier caused be MemoryError.
Is there an easy way to combine the pre-reservation by assigning a bytearray with needed size and use this with generator/comprehension-list?

It looks like that it runs a little faster, but monitoring task manager it uses more memory while generating and in later iteration steps it get stopped one step earlier caused be MemoryError.
This is because generator does not have a well known length. Python cannot iterate over the generator to know its length because it would be consumed. So it need to resize the bytearray on the fly more or less efficiently. Regarding the implementation (eg. dynamic array with a growing size or a dynamic array of independent big chunks), this can require significantly more memory than just allocating a bytearray directly at the good size. On my machine, with CPython 3.9.2, I cannot reproduce your problem because it uses memory efficient implementation.
Is there an easy way to combine the pre-reservation by assigning a bytearray with needed size and use this with generator/comprehension-list?
Yes, you can using a chunk-based copy. Here is an example:
import itertools
ls = NextLenArray(A)
L = bytearray(ls)
gen = gen_values(A)
chunkSize = 65536
for i in range(0, ls, chunkSize):
# Copy a chunk. This can (and does) allocate memory because of a
# potential internal copy. But the amount is bounded by the chunk size.
L[i:i+chunkSize] = itertools.islice(gen, chunkSize)
Note that manipulating huge amount of memory in pure Python is not efficient (especially with CPython). Consider using high-performance Python packages such as Numpy and Numba or writing some parts in a native language like C or C++ (using for example Cython). Alternatively, you could be interested in using PyPy.

Sharing large read-only array and using the apply_async method of Python's multiprocessing.pool module

I developed a little program that computes several things on a large number of independent combination of parameters. To that extent, I've parallelized my code with the pool.apply_async() method of Python's multiprocessing module. I'm using apply_async because the parallelized function takes several arguments.
At some point in the parallelized function, I need to read and retrieve the values of certain entries of a large hypercube. I've checked the memory activity on my computer and it seems like the hypercube - passed as an argument in the function to parallelized - is copied on ALL the child running processes. I'd like to know if it is possible to share my hypercube to the function parallelized with apply_async without it being copied on every child process. I only need to read some values of the hypercube at each iteration but there is a lot of iteration so I need the memory access to be quick.
I've been searching answers for my problem everywhere, a lot of people seem to have had similar issues, most of them use multiprocessing.Array but I just can't make it work with the apply_async method, the hypercube is still copied everywhere...
I'm running my code on macOS Catalina so I guess I am not subject to the problems caused with the fork() on Windows.
Here's a little example that mimics what I'm trying to do :
import numpy as np
import ctypes
import time
import multiprocessing as mp
N = 10000 # size of array
M = np.arange(0, N*N) # large array (equivalent of my hypercube)
# Sharing memory and storing M
M_shared_base = mp.Array(ctypes.c_double, M.size)
M_shared = np.frombuffer(M_shared_base.get_obj())
M_shared[:] = M[:]
del M
params = np.array([-2,0.5,8,4,45]) # a bunch of parameters
def func(idx, parameters, def_param = M_shared):
''' Function to parallelize using the shared memory for M '''
# Computes independant stuff on parameters for every iteration
# ...
# ...
time.sleep(10)
def func2(idx, parameters, matrix):
''' Function to parallelize NOT using the shared memory for M '''
# Computes independant stuff on parameters for every iteration
# ...
# ...
time.sleep(10)
if __name__ == '__main__':
print("\nStarting parallelization on 'func'...")
pool = mp.Pool(4)
output = [pool.apply_async(func, args=(i, params)) for i in range(4)]
results = [r.get() for r in output]
pool.close()
pool.join()
print("\nStarting parallelization on 'func2'...")
pool = mp.Pool(4)
output = [pool.apply_async(func2, args=(i, params, M_shared)) for i in range(4)]
results = [r.get() for r in output]
pool.close()
pool.join()
When I execute this code and look at the memory, both func and func2 are running 4 processes, each using around 780 MB. What I want is the main process to use around 780 MB and the 4 child processes only a few MB as the matrix is shared by the parent process.
Thank you very much to those who'll answer, I hope I'm not bothering with something that already has been answered somewhere else.
A very pleasant evening to all ! :)

Long writing time, short reading time from hdf5 file to numpy stack using Dask

I am doing an experiment with Dask which is a bit complicated so I cannot really provide a code snippet. This experiment uses dask threaded scheduler with only one thread ('single-thread') to load an array of size about 5GB from a single hdf5 file and to write it back into 50 npy files using dask array's to_npy_stack method. I am doing it in two parts, one buffer loading 2.5GB and the second one loading 2.5GB as well. The reading and writing are done on a HDD, hence my choice to use one thread. As we can see on the image below, dask diagnostics indicates me that the writing time in yellow (one task per writing in one numpy file) is a lot much longer than the reading part (the 2 blue tasks).
Does someone have an idea of why the reading time is that much faster than the writing time? As we can see from the bottom graph, I am loading 2.5GB on the cache so the decompression of the hdf5 file seems not to be delayed. Is it possible that the to_numpy_stack function is not well optimized?
PS: Yes, my dask diagnostics fails in the middle graph, I don't know why it depends on where I put it on my code. But anyway...
Edit: it seems that it is the usual writing time for numpy files so maybe it is just that the reading speed is greatly improved by the use of hdf5 even though it has to be uncompressed?

You may be using slow compression on your numpy files (like gzip). I recommend profiling your code so that you understand what is causing the slowdown.
Because you use the single threaded scheduler any traditional Python profiling tool, like cProfile, should work fine.

Guarantee loop takes a minimum of t seconds?

I'm writing code to query an online API, which restricts the number of times I can access per 10 seconds. I would like to make my code as fast as possible, which means querying very close to the limit.
I'm wondering if there is any way to guarantee that an iteration of a for loop takes a minimum of t seconds. So, for example, if the code inside the loop takes n < t seconds, then the program will wait t-n seconds before iterating again.
Although I'm using Julia currently, I'm open to solutions in C++, Python or Java. Also, if there are other languages in which this is easier, I'm always willing to learn.

Many languages have a getTickCount(), getFrequency() and sleep(ms) functions - you can string them together pretty easily as:
while (doMoreQueries)
{
startTick = getTickCount();
// send query, do other things
remainingMs = 10000 - (getTickCount() - startTick) * 1000 / getFrequency();
sleep(remainingMs);
}
Although I'm not familiar with Julia, in C++ you could look at using some of the features in chrono, with a sleep function like this.

Or in Julia ...
while (some_condition)
start_time=time()
# do your stuff
sleep(max(0,10-(time()-start_time)))
end

Converting matlab code, into c

i have a matlab function that reads a big matrix and calculates the Singular Value Decomposition SVD. I however need to run that on a linux system without needing to install matlab on every new system, so id like to have it converted into c source code. The code is realy simple:
function singular(m)
load c:\som\matlab.txt
[U,S,V]=svd(matlab);
m = str2num(m);
U1=U(:,1:floor(sqrt(m)));
V1=V';
Vt=V1(1:floor(sqrt(m)),:);
S1=S(1:floor(sqrt(m)),1:floor(sqrt(m)));
matlab1=U1*S1*Vt;
matlab2=abs(matlab1);
save c:\som\matlab1.txt matlab1 -ascii
save c:\som\matlab2.txt matlab2 -ascii

You can use the Matlab coder, but I advise you to make it manually, because some functions are not convertible, and the performance is not much better that mading it manually.
To make svd manually: SVD

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Problem when trying to generate a psuedo-random array/matrix from normally distributed numbers - arrays

Very large overhead to use standard python for such kind of things (after generation you have to operate on it, right?) Please, use NumPy import numpy as np q = np.random.normal(size=(5000,5000)) print(q.shape) that was pretty much instant

Related

Fill a bytearray of known size from generator

Sharing large read-only array and using the apply_async method of Python's multiprocessing.pool module

Long writing time, short reading time from hdf5 file to numpy stack using Dask

Guarantee loop takes a minimum of t seconds?

Converting matlab code, into c

Categories

Resources