Dynamic programming with Data.Vector - arrays

am using Data.Vector and am currently in need of computing the contents of a vector for use in computing a cryptographic hash(Sha1). I created the following code.
dynamic :: a -> Int -> (Int -> Vector a -> a) -> Vector a
dynamic e n f =
let
start = Data.Vector.replicate n e
in step start 0
where
step vector i = if i==n then vector
else step (vector // [(i,f i vector)]) (i+1)
I created this so that the function f filling out the vector has access to the partial
results along the way. Surely something like this must already exist in Data.Vector, no?
The problem statement is the following: You are to solve a dynamic programming problem where the finished result is an array. You know the size of the array size and you have a recursive function for filling it out.

You probably already saw the function generate, which takes a size n and a function f of type Int -> a and then produces a Vector a of size n. What you probably weren't aware of is that when using this function you actually do have access to the partial results.
What I mean to say is that inside the function you pass to generate you can refer to the vector you're defining and due to Haskell's laziness it will work fine (unless you make it so that the different items of the vector depend on each other in a circular fashion, of course).
Example:
import Data.Vector
tenFibs = generate 10 fib
where fib 0 = 0
fib 1 = 1
fib n = tenFibs ! (n-1) + tenFibs ! (n-2)
tenFibs is now a vector containing the first 10 Fibonacci numbers.

Maybe you could use one of Data.Vector's scan functions?
http://hackage.haskell.org/packages/archive/vector/0.6.0.2/doc/html/Data-Vector.html#32

Related

Numpy for-loop runtime too long

I have a problem with the runtime of my code. The only module that is really slow is my for-loop over every matrix element in a (144, 208)-array.
I have to check every element if the condition is fulfilled and if so, i have to perform several actions like shifting another (144, 208)-array and add it to an existing one.
Is this not changeable or is my implementation way too beginner-like?
Here is my code:
# With this codeblock i am loading a specific image into python and
binarize it
g = Initialization()
b_init = g.initialize_grid(".\\geometries\\1.png")
# this function will modify the matrix m_sp, which i load in as csv.file
def expand_blockavg(x, h, w):
m, n = x.shape
return np.broadcast_to((x/float(h*w))[:, None, :, None], (m, h, n, w)).reshape(m*h, -1)
m_adapt = expand_blockavg(m_sp, 16, 16) / 256
# This is my actual calculation block
for index, x in np.ndenumerate(b_init):
if x == 1:
a = np.asarray(index)
y = np.subtract(a, index_default)
m_shift = shift(m_adapt, (y[0], y[1]), cval=0)
b = np.add(m_shift, b)
SO, the last block (calculation) is what takes so long. I know that the loop has to check 30k elements. But i thought that with numpy it will be faster.
Can some1 tell me if there's potential for optimization or do i have to live with the fact that it'll take so long.
thanks
Iteration in python is very slow compared to vectorized numpy operations.
An immediate optimization is to only iterate over the indices where the matrix is 1, rather than checking each index. Do this with:
indices = np.argwhere(b_init == 1)
for a in indices:
y = np.array(a) - index_default
m_shift = shift(m_adapt, y[:2], cval=0)
b += m_shift
Not knowing the details of shift it’s hard to say if you can vectorize that also. I replaced function calls with equivalent operations which should be faster; np.add etc. are mostly useful when the operation is being selected programmatically.

Network Formation and Large Array's in Matlab Optimization

I am getting an error using repmat. My Matlab version is 2017a. "Requested 3711450x2726 (75.4GB) array exceeds maximum array size..." First, some context.
I have an adjacency matrix of social network data call it D. D is 2725x2725 with 1s denoting a link between agents i and j and 0s otherwise. I have been provided a function and sub-functions for a network formation model. There are K regressors (x variables). The model requires forming a dyad-specific regressor matrix W that is W = 0.5N(N-1) x K. In my data, this is 3711450 x K. For a start, I select only one x variable so K=1.
In the main function, there are two steps. The first step calculates the joint MLE from a logit. I have a problem in the second step computation of the variance covariance matrix with array size. Inside this step, there is a calculation that creates a 3711450 x n (2725) matrix using repmat.
INFO = ((repmat((exp_Xbeta ./ (1+exp_Xbeta).^2),1,K) .* X)'*X);
exp_Xbeta is 3711450 x K and X is a sparse 3711450 x 2725 matrix with Bytes = 178171416 of class double. The error occurs at INFO.
I've tried converting X to a tall matrix but thus far no joy. I've tried adding sparse to the INFO line but again no joy. Anyone have any ideas short of going to a cluster or getting more ram? Could I somehow convert X from a sparse matrix to a full matrix inside a datastore and then call the datastore using tall? I have not been able to figure out how to do that if it is possible.
Once INFO is constructed as an array it will be used later in one of the sub-functions. So, it needs to be callable. In case you're curious, INFO is the second derivative matrix.
I have found that producing the INFO matrix all at once was too much for my memory constraints. I split up the steps, but still, repmat and subsequent steps were a problem. Now, I've turned to building up the INFO matrix one step at a time, while never holding more than exp_Xbeta, X, and two vectors in memory. Replacing the construction of INFO with
for i = 1:d
s1_i = step1(:,1).*X(:,i);
s1_i = s1_i';
for j = 1:d;
INFO(i,j) = s1_i*X(:,j);
end
clear s1_i;
end
has dropped the memory requirement, though its slow, and things seem to be working. For anyone interested, below is a little example illustrating the point.
clear all
N = 20
n = 0.5*N*(N-1)
exp_Xbeta = rand(n,1);
X = rand(n,N);
step1 = (exp_Xbeta ./ (1+exp_Xbeta).^2);
[c,d] = size(X);
INFO = zeros(d,d);
for i = 1:d
s1_i = step1(:,1).*X(:,i)
s1_i = s1_i'
for j = 1:d
INFO(i,j) = s1_i*X(:,j)
end
clear s1_i
end
K = 1
INFO2 = ((repmat((exp_Xbeta ./ (1+exp_Xbeta).^2),1,K) .* X)'*X);
% Methods produce equivalent matrices
INFO
INFO2

How to create a matrix in OCaml?

I'am learning OCaml and currently i'am trying to undertand how iteration works in OCaml and how to create a matrix. I want an array 5 x 5 filled with 0. I know there is an issue with shared references so i created a new array at each iteration however iam having issues in other places, specifically at line 6. Let me know of other issues like indentation practices.
open Array;;
let n = ref 5 and i = ref 0 in
let m = Array.make !n 0 in
while !i < !n do
m.(!i) <- Array.make !n 0;;
i := !i + 1;;
done
m;;
You are using ;; too much. Contrary to popular belief, ;; is not part of ordinary OCaml syntax (in my opinion anyway). It's just a special way to tell the toplevel (the REPL) that you want it to evaluate what you've typed so far.
Leave the ;; after open Array. But change all but the last ;; to ; instead.
(Since you reference the Array module by name in your code, which IMHO is good style, you can also just leave out the open Array;; altogether.)
You want the last ;; because you do want the toplevel to evaluate what you've typed so far.
Your syntax error is caused by the fact that your overall code is like this
let ... in
let ... in
while ... do
...
done
m
The while is one expression (in OCaml everything is an expression) and m is another expression. If you want to have two expressions in a row you need ; between them. So you need ; after done.
You also have a type error. When you create m you're creating an array of ints (your given initial value is 0). So you can't make it into a matrix (an array of arrays) later in the code.
Also (not trying to overload you with criticisms :-) this code reads like imperative code. It's not particularly idiomatic OCaml code. In most people's code, using ref is pretty rare. One immediate improvement I see would just be to say let n = 5. You're not changing the value of n anywhere that I see (though maybe this is part of a larger chunk of code). Another improvement would be to use for instead of while.
Finally, you can do this entire operation in one function call:
let n = 5 in
let m = Array.init n (fun i -> Array.make n 0) in
m
Using explicit loops is actually also quite rare in OCaml (at least in my code).
Or you could try this:
let n = 5 in
let m = Array.make_matrix n n 0 in
m

Haskell : Increment index in a loop

I have a function that calculates f(n) in Haskell.
I have to write a loop so that it will start calculating values from f(0) to f(n), and will every time compare the value of f(i) with some fixed value.
I am an expert in OOP, hence I am finding it difficult to think in the functional way.
For example, I have to write something like
while (number < f(i))
i++
How would I write this in Haskell?
The standard approach here is
Create an infinite list containing all values of f(n).
Search this list until you find what you're after.
For example,
takeWhile (number <) $ map f [0..]
If you want to give up after you reach "n", you can easily add that as a separate step:
takeWhile (number <) $ take n $ map f [0..]
or, alternatively,
takeWhile (number <) $ map f [0 .. n]
You can do all sorts of other filtering, grouping and processing in this way. But it requires a mental shift. It's a bit like the difference between writing a for-loop to search a table, versus writing an SQL query. Think about Haskell as a bit like SQL, and you'll usually see how to structure your code.
You can generate the list of the is such that f i is larger than your number:
[ i | i<-[0..] , f i > number ]
Then, you can simply take the first one, if that's all you want:
head [ i | i<-[0..] , f i > number ]
Often, many idiomatic loops in imperative programming can be rephrased as list comprehensions, or expressed through map, filter, foldl, foldr. In the general case, when the loop is more complex, you can always exploit recursion instead.
Keep in mind that a "blind" translation from imperative to functional programming will often lead to non-idiomatic, hard-to-read code, as it would be the case when translating in the opposite direction. Still, I find it relieving that such translation is always possible.
If you are new to functional programming, I would advise against learning it by translating what you know about imperative programming. Rather, start from scratch following a good book (LYAH is a popular choice).
The first thing that's weird from a functional approach is that it's unclear what the result of your computation is. Do you care about the final result of f (i)? Perhaps you care about i itself. Without side effects everything neends to have a value.
Let's assume you want the final value of the function f (i) as soon as some comparison fails. You can simulate your own while loops using recursion and guards!
while :: Int -> Int -> (Int -> Int) -> Int
while start number f
| val >= number = val
| otherwise = while (start + 1) number f
where
val = f start
Instead of explicit recursion, you can use until e.g.
findGreaterThan :: (Int -> Int) -> Int -> Int -> (Int, Int)
findGreaterThan f init max = until (\(v, i) -> v >= max) (\(v, i) -> (f v, i + 1)) (init, 0)
this returns a pair containing the first value to fail the condition and the number of iterations of the given function.

Iterating with respect to two variables in haskell

OK, continuing with my solving of the problems on Project Euler, I am still beginning to learn Haskell and programming in general.
I need to find the lowest number divisible by the numbers 1:20
So I started with:
divides :: Int -> Int -> Bool
divides d n = rem n d == 0
divise n a | divides n a == 0 = n : divise n (a+1)
| otherwise = n : divise (n+1) a
What I want to happen is for it to keep moving up for values of n until one magically is evenly divisible by [1..20].
But this doesn't work and now I am stuck as from where to go from here. I assume I need to use:
[1..20]
for the value of a but I don't know how to implement this.
Well, having recently solved the Euler problem myself, I'm tempted to just post my answer for that, but for now I'll abstain. :)
Right now, the flow of your program is a bit chaotic, to sound like a feng-shui person. Basically, you're trying to do one thing: increment n until 1..20 divides n. But really, you should view it as two steps.
Currently, your code is saying: "if a doesn't divide n, increment n. If a does divide n, increment a". But that's not what you want it to say.
You want (I think) to say "increment n, and see if it divides [Edit: with ALL numbers 1..20]. If not, increment n again, and test again, etc." What you want to do, then, is have a sub-test: one that takes a number, and tests it against 1..20, and then returns a result.
Hope this helps! Have fun with the Euler problems!
Edit: I really, really should remember all the words.
Well, as an algorithm, this kinda sucks.
Sorry.
But you're getting misled by the list. I think what you're trying to do is iterate through all the available numbers, until you find one that everything in [1..20] divides. In your implementation above, if a doesn't divide n, you never go back and check b < a for n+1.
Any easy implementation of your algorithm would be:
lcmAll :: [Int] -> Maybe Int
lcmAll nums = find (\n -> all (divides n) nums) [1..]
(using Data.List.find and Data.List.all).
A better algorithm would be to find the lcm's pairwise, using foldl:
lcmAll :: [Int] -> Int
lcmAll = foldl lcmPair 1
lcmPair :: Int -> Int -> Int
lcmPair a b = lcmPair' a b
where lcmPair' a' b' | a' < b' = lcmPair' (a+a') b'
| a' > b' = lcmPair' a' (b + b')
| otherwise = a'
Of course, you could use the lcm function from the Prelude instead of lcmPair.
This works because the least common multiple of any set of numbers is the same as the least common multiple of [the least common multiple of two of those numbers] and [the rest of the numbers]
The function 'divise' never stops, it doesn't have a base case. Both branches calls divise, thus they are both recursive. Your also using the function divides as if it would return an int (like rem does), but it returns a Bool.
I see you have already started to divide the problem into parts, this is usually good for understanding and making it easier to read.
Another thing that can help is to write the types of the functions. If your function works but your not sure of its type, try :i myFunction in ghci. Here I've fixed the type error in divides (although other errors remains):
*Main> :i divise
divise :: Int -> Int -> [Int] -- Defined at divise.hs:4:0-5
Did you want it to return a list?
Leaving you to solve the problem, try to further divide the problem into parts. Here's a naive way to do it:
A function that checks if one number is evenly divisible by another. This is your divides function.
A function that checks if a number is dividable by all numbers [1..20].
A function that tries iterates all numbers and tries them on the function in #2.
Here's my quick, more Haskell-y approach, using your algorithm:
Prelude> let divisibleByUpTo i n = all (\x -> (i `rem` x) == 0) [1..n]
Prelude> take 1 $ filter (\x -> snd x == True) $ map (\x -> (x, divisibleByUpTo x 4)) [1..]
[(12,True)]
divisibleByUpTo returns a boolean if the number i is divisible by every integer up to and including n, similar to your divides function.
The next line probably looks pretty difficult to a Haskell newcomer, so I'll explain it bit-by-bit:
Starting from the right, we have map (\x -> (x, divisibleByUpTo x 4)) [1..] which says for every number x from 1 upwards, do divisibleByUpTo x 4 and return it in a tuple of (x, divisibleByUpTo x 4). I'm using a tuple so we know which number exactly divides.
Left of that, we have filter (\x -> snd x == True); meaning only return elements where the second item of the tuple is True.
And at the leftmost of the statement, we take 1 because otherwise we'd have an infinite list of results.
This will take quite a long time for a value of 20. Like others said, you need a better algorithm -- consider how for a value of 4, even though our "input" numbers were 1-2-3-4, ultimately the answer was only the product of 3*4. Think about why 1 and 2 were "dropped" from the equation.

Resources