Nesting n-sets of for loops - (Julia) - loops

I am currently trying to create n-nested loops in order to compute a rather tricky sum.
Each term in the sum depends on the last, and therefore the nesting is required. Here is some Julia-written pseudo-code explaining what I'm trying to get:
# u,v are fixed integers
# m is a fixed, n-length array of integers
_
for i_1 in 0:min(m[1], u) |#Trio of Loops we want to repeat n times
for j_1 in 0:min(m[1], v) |
for t_1 in i_1:(i_1 + j_1) _|#1
_
for i_2 in 0:min(m[2], u - i_1) |
for j_2 in 0:min(m[2], v - j_1) |
for t_2 in i_2:(i_2 + j_2) _|#2
#... repeat until nth time...#
_
for i_n in 0:min(m[n],u - sum(#all i_'s up to n-1)) |
for j_n in 0:min(m[n],v - sum(#all j_'s up to n-1)) |
for t_n in i_n:(i_n + j_n) _|#n
#Sum over sum function which depends on i_'s and j_'s and t_'s
X += f(i_1,j_1,t_1)*f(i_2,j_2,t_2)*...*f(i_n,j_n,t_n)
I am completely unsure of how to properly code this for an arbitrary n-number of trios. If the number n is small then obviously this can be explicitly written, but if not, then we are stuck. I assume that there is a recursive way to employ this code, but I am unsure how to go about this.
Any and all help would be appreciated.

Here is a quick attempt at a slightly simplified version of your problem.
function inner(X, f, ms)
is = [Symbol(:i_, n) for n in 1:length(ms)]
js = [Symbol(:j_, n) for n in 1:length(ms)]
fs = [:($f($i, $j)) for (i,j) in zip(is, js)]
:($X += *($(fs...)))
end
inner(:X, :g, [3,4,5]) # :(X += g(i_1, j_1) * g(i_2, j_2) * g(i_3, j_3))
function looper(ex::Expr, ms, u)
isempty(ms) && return ex
n = length(ms)
m_n = ms[n]
i_n = Symbol(:i_,n)
j_n = Symbol(:j_,n)
out = :( for $i_n in 0:min($m_n, $u)
for $j_n in 0:min($m_n, $u)
$ex
end
end)
looper(out, ms[1:end-1], u)
end
looper(inner(:g, [4,5]), [4,5], 3)
# :(for i_1 = 0:min(4, 3)
# for j_1 = 0:min(4, 3)
# for i_2 = 0:min(5, 3)
# for j_2 = 0:min(5, 3)
# X += g(i_1, j_1) * g(i_2, j_2) ...
macro multi(X, f, ms, u)
ms isa Expr && ms.head == :vect || error("expected a literal vector like [1,2,3]")
looper(inner(esc(X), esc(f), ms.args), ms.args, esc(u))
end
#macroexpand #multi X f [4,5] u
# :(for var"#74#i_1" = 0:Main.min(4, u) ...
function multi(X, f, u)
#multi X f [4,5] u
X
end
multi(0.0, atan, 1)

Related

equivalent of numpy.c_ in julia

Hi I am going through the book https://nnfs.io/ but using JuliaLang (it's a self-challenge to get to know the language better and use it more often.. rather than doing the same old same in Python..)
I have come across a part of the book in which they have custom wrote some function and I need to recreate it in JuliaLang...
source: https://cs231n.github.io/neural-networks-case-study/
python
N = 100 # number of points per class
D = 2 # dimensionality
K = 3 # number of classes
X = np.zeros((N*K,D)) # data matrix (each row = single example)
y = np.zeros(N*K, dtype='uint8') # class labels
for j in range(K):
ix = range(N*j,N*(j+1))
r = np.linspace(0.0,1,N) # radius
t = np.linspace(j*4,(j+1)*4,N) + np.random.randn(N)*0.2 # theta
X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
y[ix] = j
# lets visualize the data:
plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.Spectral)
plt.show()
my julia version so far....
N = 100 # Number of points per class
D = 2 # Dimensionality
K = 3 # Number of classes
X = zeros((N*K, D))
y = zeros(UInt8, N*K)
# See https://docs.julialang.org/en/v1/base/math/#Base.range
for j in range(0,length=K)
ix = range(N*(j), length = N+1)
radius = LinRange(0.0, 1, N)
theta = LinRange(j*4, (j+1)*4, N) + randn(N)*0.2
X[ix] = ????????
end
notice the ??????? area because I am now trying to decipher if Julia has an equivalent for this numpy function
https://numpy.org/doc/stable/reference/generated/numpy.c_.html
Any help is appreciated.. or just tell me if I need to write something myself
This is a special object to provide nice syntax for column concatanation. In Julia this is just built into the language hence you can do:
julia> a=[1,2,3];
julia> b=[4,5,6];
julia> [a b]
3×2 Matrix{Int64}:
1 4
2 5
3 6
For your case the Julian equivalent of np.c_[r*np.sin(t), r*np.cos(t)] should be:
[r .* sin.(t) r .* cos.(t)]
To understand Python's motivation you can also have a look at :
numpy.r_ is not a function. What is it?
The equivalent of numpy.c_ would seem to be horizontal concatenation, which you can do with either the hcat function or with (e.g.) simply [a b]. Fixing a few other issues with the translation so far, we end up with
N = 100 # Number of points per class
D = 2 # Dimensionality
K = 3 # Number of classes
X = zeros(N*K, D)
y = zeros(UInt8, N*K)
for j in range(0,length=K)
ix = (N*j+1):(N*(j+1))
radius = LinRange(0.0, 1, N)
theta = LinRange(j*4, (j+1)*4, N) + randn(N)*0.2
X[ix,:] .= [radius.*sin.(theta) radius.*cos.(theta)]
y[ix] .= j
end
# visualize the data:
using Plots
scatter(X[:,1], X[:,2], zcolor=y, framestyle=:box)

R - avoid nested for loops

I have the following function which takes 4 vectors. The T vector has a given length and all 3 other vectors (pga, Sa5Hz and Sa1Hz) have a given (identical but not necessarily equal to T) lenght.
The output is a matrix with length(T) rows and length(pga) columns.
My code below seems like the perfect example of what NOT to do, however, I could not figure out a way to optimize it using an apply function. Can anyone help?
designSpectrum <- function (T, pga, Sa5Hz, Sa1Hz){
Ts <- Sa1Hz / Sa5Hz
#By convention, if Sa5Hz is null, set Ts as 0.
Ts[is.nan(Ts)] <- 0
res <- matrix(NA, nrow = length(T), ncol = length(pga))
for (i in 1:nrow(res))
{
for (j in 1:ncol(res))
{
res[i,j] <- if(T[i] <= 0) {pga[j]}
else if (T[i] <= 0.2 * Ts[j]) {pga[j] + T[i] * (Sa5Hz[j] - pga[j]) / (0.2 * Ts[j])}
else if (T[i] <= Ts[j]) {Sa5Hz[j]}
else Sa1Hz[j] / T[i]
}
}
return(res)
}
Instead of doing a double for loop and processing each i and j value separately, you could use the outer function to process all of them in one shot. Since you're now processing multiple i and j values simultaneously, you could switch to the vectorized ifelse statement instead of the non-vectorized if and else statements:
designSpectrum2 <- function (T, pga, Sa5Hz, Sa1Hz) {
Ts <- Sa1Hz / Sa5Hz
Ts[is.nan(Ts)] <- 0
outer(1:length(T), 1:length(pga), function(i, j) {
ifelse(T[i] <= 0, pga[j],
ifelse(T[i] <= 0.2 * Ts[j], pga[j] + T[i] * (Sa5Hz[j] - pga[j]) / (0.2 * Ts[j]),
ifelse(T[i] <= Ts[j], Sa5Hz[j], Sa1Hz[j] / T[i])))
})
}
identical(designSpectrum(T, pga, Sa5Hz, Sa1Hz), designSpectrum2(T, pga, Sa5Hz, Sa1Hz))
# [1] TRUE
Data:
T <- -1:3
pga <- 1:3
Sa5Hz <- 2:4
Sa1Hz <- 3:5
You can see the efficiency gains by testing on rather large vectors (here I'll use an output matrix with 1 million entries):
# Larger vectors
set.seed(144)
T2 <- runif(1000, -1, 3)
pga2 <- runif(1000, -1, 3)
Sa5Hz2 <- runif(1000, -1, 3)
Sa1Hz2 <- runif(1000, -1, 3)
# Runtime comparison
all.equal(designSpectrum(T2, pga2, Sa5Hz2, Sa1Hz2), designSpectrum2(T2, pga2, Sa5Hz2, Sa1Hz2))
# [1] TRUE
system.time(designSpectrum(T2, pga2, Sa5Hz2, Sa1Hz2))
# user system elapsed
# 4.038 1.011 5.042
system.time(designSpectrum2(T2, pga2, Sa5Hz2, Sa1Hz2))
# user system elapsed
# 0.517 0.138 0.652
The approach with outer is almost 10x faster.

How to vectorize the antenna arrayfactor expression in matlab

I have the antenna array factor expression here:
I have coded the array factor expression as given below:
lambda = 1;
M = 100;N = 200; %an M x N array
dx = 0.3*lambda; %inter-element spacing in x direction
m = 1:M;
xm = (m - 0.5*(M+1))*dx; %element positions in x direction
dy = 0.4*lambda;
n = 1:N;
yn = (n - 0.5*(N+1))*dy;
thetaCount = 360; % no of theta values
thetaRes = 2*pi/thetaCount; % theta resolution
thetas = 0:thetaRes:2*pi-thetaRes; % theta values
phiCount = 180;
phiRes = pi/phiCount;
phis = -pi/2:phiRes:pi/2-phiRes;
cmpWeights = rand(N,M); %complex Weights
AF = zeros(phiCount,thetaCount); %Array factor
tic
for i = 1:phiCount
for j = 1:thetaCount
for p = 1:M
for q = 1:N
AF(i,j) = AF(i,j) + cmpWeights(q,p)*exp((2*pi*1j/lambda)*(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i))));
end
end
end
end
How can I vectorize the code for calculating the Array Factor (AF).
I want the line:
AF(i,j) = AF(i,j) + cmpWeights(q,p)*exp((2*pi*1j/lambda)*(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i))));
to be written in vectorized form (by modifying the for loop).
Approach #1: Full-throttle
The innermost nested loop generates this every iteration - cmpWeights(q,p)*exp((2*pi*1j/lambda)*(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i)))), which are to summed up iteratively to give us the final output in AF.
Let's call the exp(.... part as B. Now, B basically has two parts, one is the scalar (2*pi*1j/lambda) and the other part
(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i))) that is formed from the variables that are dependent on
the four iterators used in the original loopy versions - i,j,p,q. Let's call this other part as C for easy reference later on.
Let's put all that into perspective:
Loopy version had AF(i,j) = AF(i,j) + cmpWeights(q,p)*exp((2*pi*1j/lambda)*(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i)))), which is now equivalent to AF(i,j) = AF(i,j) + cmpWeights(q,p)*B, where B = exp((2*pi*1j/lambda)*(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i)))).
B could be simplified to B = exp((2*pi*1j/lambda)* C), where C = (xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i))).
C would depend on the iterators - i,j,p,q.
So, after porting onto a vectorized way, it would end up as this -
%// 1) Define vectors corresponding to iterators used in the loopy version
I = 1:phiCount;
J = 1:thetaCount;
P = 1:M;
Q = 1:N;
%// 2) Create vectorized version of C using all four vector iterators
mult1 = bsxfun(#times,sin(thetas(J)),cos(phis(I)).'); %//'
mult2 = bsxfun(#times,sin(thetas(J)),sin(phis(I)).'); %//'
mult1_xm = bsxfun(#times,mult1(:),permute(xm,[1 3 2]));
mult2_yn = bsxfun(#times,mult2(:),yn);
C_vect = bsxfun(#plus,mult1_xm,mult2_yn);
%// 3) Create vectorized version of B using vectorized C
B_vect = reshape(exp((2*pi*1j/lambda)*C_vect),phiCount*thetaCount,[]);
%// 4) Final output as matrix multiplication between vectorized versions of B and C
AF_vect = reshape(B_vect*cmpWeights(:),phiCount,thetaCount);
Approach #2: Less-memory intensive
This second approach would reduce the memory traffic and it uses the distributive property of exponential - exp(A+B) = exp(A)*exp(B).
Now, the original loopy version was this -
AF(i,j) = AF(i,j) + cmpWeights(q,p)*exp((2*pi*1j/lambda)*...
(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i))))
So, after using the distributive property, we would endup with something like this -
K = (2*pi*1j/lambda)
part1 = K*xm(p)*sin(thetas(j))*cos(phis(i));
part2 = K*yn(q)*sin(thetas(j))*sin(phis(i));
AF(i,j) = AF(i,j) + cmpWeights(q,p)*exp(part1)*exp(part2);
Thus, the relevant vectorized approach would become something like this -
%// 1) Define vectors corresponding to iterators used in the loopy version
I = 1:phiCount;
J = 1:thetaCount;
P = 1:M;
Q = 1:N;
%// 2) Define the constant used at the start of EXP() call
K = (2*pi*1j/lambda);
%// 3) Perform the sine-cosine operations part1 & part2 in vectorized manners
mult1 = K*bsxfun(#times,sin(thetas(J)),cos(phis(I)).'); %//'
mult2 = K*bsxfun(#times,sin(thetas(J)),sin(phis(I)).'); %//'
%// Perform exp(part1) & exp(part2) in vectorized manners
part1_vect = exp(bsxfun(#times,mult1(:),xm));
part2_vect = exp(bsxfun(#times,mult2(:),yn));
%// Perform multiplications with cmpWeights for final output
AF = reshape(sum((part1_vect*cmpWeights.').*part2_vect,2),phiCount,[])
Quick Benchmarking
Here are the runtimes with the input data listed in the question for the original loopy approach and proposed approach #2 -
---------------------------- With Original Approach
Elapsed time is 358.081507 seconds.
---------------------------- With Proposed Approach #2
Elapsed time is 0.405038 seconds.
The runtimes suggests a crazy performance improvement with Approach #2!
The basic trick is to figure out what things are constant, and what things depend on the subscript term - and therefore are matrix terms.
Within the sum:
C(n,m) is a matrix
2π/λ is a constant
sin(θ)cos(φ) is a constant
x(m) and y(n) are vectors
So the two things I would do are:
Expand the xm and ym into matrices using meshgrid()
Take all the constant term stuff outside the loop.
Like this:
...
piFactor = 2 * pi * 1j / lambda;
[xgrid, ygrid] = meshgrid(xm, ym); % xgrid and ygrid will be size (N, M)
for i = 1:phiCount
for j = 1:thetaCount
xFactor = sin(thetas(j)) * cos(phis(i));
yFactor = sin(thetas(j)) * sin(phis(i));
expFactor = exp(piFactor * (xgrid * xFactor + ygrid * yFactor)); % expFactor is size (N, M)
elements = cmpWeights .* expFactor; % elements of sum, size (N, M)
AF(i, j) = AF(i, j) + sum(elements(:)); % sum and then integrate.
end
end
You could probably figure out how to vectorise the outer loop too, but hopefully that gives you a starting point.

How to improve this algorithm by 1)use arrays, 2)avoid list concatenation (lazy lists?)?

I tried to learn how the STArray works, but I couldn't. (Doc is poor, or at least the one I found).
Any way, I have the next algorithm, but it uses a lot of !!, which is slow. How can I convert it to use the STArray monad?
-- The Algorithm prints the primes present in [1 .. n]
main :: IO ()
main = print $ primesUpTo 100
type Nat = Int
primesUpTo :: Nat -> [Nat]
primesUpTo n = primesUpToAux n 2 [1]
primesUpToAux :: Nat -> Nat -> [Nat] -> [Nat]
primesUpToAux n current primes =
if current > n
then primes
else primesUpToAux n (current + 1) newAcum
where newAcum = case isPrime current primes of
True -> primes++[current]
False -> primes
isPrime :: Nat -> [Nat] -> Bool
isPrime 1 _ = True
isPrime 2 _ = True
isPrime x neededPrimes = isPrimeAux x neededPrimes 1
isPrimeAux x neededPrimes currentPrimeIndex =
if sqrtOfX < currentPrime
then True
else if mod x currentPrime == 0
then False
else isPrimeAux x neededPrimes (currentPrimeIndex + 1)
where
sqrtOfX = sqrtNat x
currentPrime = neededPrimes !! currentPrimeIndex
sqrtNat :: Nat -> Nat
sqrtNat = floor . sqrt . fromIntegral
Edit
Oops, the !! wasn't the problem; in the next version of the algorithm (below) I've removed the use of !!; also, I fixed 1 being a prime, which is not, as pointed by #pedrorodrigues
main :: IO ()
main = print $ primesUpTo 20000
type Nat = Int
primesUpTo :: Nat -> [Nat]
primesUpTo n = primesUpToAux n 1 []
primesUpToAux :: Nat -> Nat -> [Nat] -> [Nat]
primesUpToAux n current primesAcum =
if current > n
then primesAcum
else primesUpToAux n (current + 1) newPrimesAcum
where newPrimesAcum = case isPrime current primesAcum of
True -> primesAcum++[current]
False -> primesAcum
isPrime :: Nat -> [Nat] -> Bool
isPrime 1 _ = False
isPrime 2 _ = True
isPrime x neededPrimes =
if sqrtOfX < currentPrime
then True
else if mod x currentPrime == 0
then False
else isPrime x restOfPrimes
where
sqrtOfX = sqrtNat x
currentPrime:restOfPrimes = neededPrimes
sqrtNat :: Nat -> Nat
sqrtNat = floor . sqrt . fromIntegral
Now this question is about 2 questions really:
1.- How to transform this algorithm to use arrays instead of lists? (Is for the sake of learning how to handle state and arrays in Haskell)
Which somebody already answered in the comments, but pointing to a not very good explained example.
2.- How to eliminate the concatenation of lists every time a new prime is found?
True -> primesAcum++[current]
Here's a more or less direct translation of your code into working with an unboxed array of integers:
import Control.Monad
import Control.Monad.ST
import Data.Array.ST
import Data.Array.Unboxed
import Control.Arrow
main :: IO ()
main = print . (length &&& last) . primesUpTo $ 1299709
primesUpTo :: Int -> [Int]
primesUpTo = takeWhile (> 0) . elems . primesUpToUA
primesUpToUA :: Int -> UArray Int Int
primesUpToUA n = runSTUArray $ do
let k = floor( 1.25506 * fromIntegral n / log (fromIntegral n)) -- WP formula
ar <- newArray (0,k+1) 0 -- all zeroes initially, two extra spaces
let
loop c i | c > n = return ar -- upper limit is reached
| otherwise = do -- `i` primes currently in the array
b <- isPrime 0 i c -- is `c` a prime?
if b then do { writeArray ar i c ; loop (c+1) (i+1) }
else loop (c+1) i
isPrime j i c | j >= i = return True -- `i` primes
| otherwise = do -- currently in the array
p <- readArray ar j
if p*p > c then return True
else if rem c p == 0 then return False
else isPrime (j+1) i c
loop 2 0
This is more or less self-explanatory, when you read it slowly, one statement at a time.
Using arrays, there's no problems with list concatenation, as there are no lists. We use the array of primes as we're adding new items to it.
Of course you can re-write your list-based code to behave better; the simplest re-write is
ps :: [Int]
ps = 2 : [i | i <- [3..],
and [rem i p > 0 | p <- takeWhile ((<=i).(^2)) ps]]
primesTo n = takeWhile (<= n) ps
The key is to switch from recursive thinking to corecursive thinking - not how to add at the end, explicitly, but to define how a list is to be produced — and let the lazy semantics take care of the rest.

How to translate this list-based code into using mutable arrays?

EDIT3: I'm writing a code to process very long input list of Ints with only few hundred non-duplicates. I use two auxiliary lists to maintain cumulative partial sums to calculate some accumulator value, the how's and why's are non-important. I want to ditch all lists here and turn it into nice destructive loop, and I don't know how. I don't need the whole code, just a skeleton code would be great, were read/write is done to two auxiliary arrays and some end result is returned. What I have right now would run 0.5 hour for the input. I've coded this now in C++, and it runs in 90 seconds for the same input.
I can't understand how to do this, at all. This is the list-based code that I have right now:(but the Map-based code below is clearer)
ins :: (Num b, Ord a) => a -> b -> [(a, b)] -> ([(a, b)], b)
ins n x [] = ( [(n,x)], 0)
ins n x l#((v, s):t) =
case compare n v of
LT -> ( (n,s+x) : l , s )
EQ -> ( (n,s+x) : t , if null t then 0 else snd (head t))
GT -> let (u,z) = ins n x t
in ((v,s+x):u,z)
This is used in a loop, to process a list of numbers of known length, (changed it to foldl now)
scanl g (0,([],[])) ns -- ns :: [Int]
g ::
(Num t, Ord t, Ord a) =>
(t, ([(a, t)], [(a, t)])) -> a -> (t, ([(a, t)], [(a, t)]))
g (c,( a, b)) n =
let
(a2,x) = ins n 1 a
(b2,y) = if x>0 then ins n x b else (b,0)
c2 = c + y
in
(c2,( a2, b2))
This works, but I need to speed it up. In C, I would keep the lists (a,b) as arrays; use binary search to find the element with the key just above or equal to n (instead of the sequential search used here); and use in-place update to change all the preceding entries.
I'm only really interested in final value. How is this done in Haskell, with mutable arrays?
I tried something, but I really don't know what I'm doing here, and am getting strange and very long error messages (like "can not deduce ... from context ..."):
goarr top = runSTArray $ do
let sz = 10000
a <- newArray (1,sz) (0,0) :: ST s (STArray s Int (Integer,Integer))
b <- newArray (1,sz) (0,0) :: ST s (STArray s Int (Integer,Integer))
let p1 = somefunc 2 -- somefunc :: Integer -> [(Integer, Int)]
go1 p1 2 0 top a b
go1 p1 i c top a b =
if i >= top
then
do
return c
else
go2 p1 i c top a b
go2 p1 i c top a b =
do
let p2 = somefunc (i+1) -- p2 :: [(Integer, Int)]
let n = combine p1 p2 -- n :: Int
-- update arrays and calc new c
-- like the "g" function is doing:
-- (a2,x) = ins n 1 a
-- (b2,y) = if x>0 then ins n x b else (b,0)
-- c2 = c + y
go1 p2 (i+1) c2 top a b -- a2 b2??
This doesn't work at all. I don't even know how to encode loops in do notation. Please help.
UPD: the Map based code that runs 3 times slower:
ins3 :: (Ord k, Num a) => k -> a -> Map.Map k a -> (Map.Map k a, a)
ins3 n x a | Map.null a = (Map.insert n x a , 0)
ins3 n x a = let (p,q,r) = Map.splitLookup n a in
case q of
Nothing -> (Map.union (Map.map (+x) p)
(Map.insert n (x+leftmost r) r) , leftmost r)
Just s -> (Map.union (Map.map (+x) p)
(Map.insert n (x+s) r) , leftmost r)
leftmost r | Map.null r = 0
| otherwise = snd . head $ Map.toList r
UPD2: The error message is " Could not deduce (Num (STArray s1 i e)) from the context () arising from the literal `0' at filename.hs:417:11"
that's where it says return c in go1 function. Perhaps c is expected to be an array, but I want to return the accumulator value that is built while using the two auxiliary arrays.
EDIT3: I've replaced scanl and (!!) with foldl and take as per Chris's advice, and now it runs in constant space with sane empirical complexity and is actually projected to finish in under 0.5 hour - a.o.t. ... 3 days ! I knew about it of course but was so sure GHC optimizes the stuff away for me, surely it wouldn't make that much of a difference, I thought! And so felt only mutable arrays could help... Bummer.
Still, C++ does same in 90 sec, and I would very much appreciate help in learning how to code this with mutable arrays, in Haskell.
Are the input values ever EQ? If they are not EQ then the way scanl g (0,([],[])) ns is used means that the first [(,)] array, call it a always has map snd a == reverse [1..length a] at each stage of g. For example, in a length 10 list the value of snd (a !! 4) is going to be 10-4. Keeping these reversed index values by mutating the second value of each preceding entry in a is quite wasteful. If you need speed then this is one place to make a better algorithm.
None of this applies to the second [(,)] whose purpose is still mysterious to me. It records all insertions that were not done at the end of a, so perhaps it allows one to reconstruct the initial sequence of values.
You said "I'm only really interested in final value." Do you mean you only care about the last value in list output by the scanl .. line? If so then you need a foldl instead of scanl.
Edit: I am adding a non-mutable solution using a custom Finger Tree. It passes my ad hoc testing (at bottom of code):
{-# LANGUAGE MultiParamTypeClasses #-}
import Data.Monoid
import Data.FingerTree
data Entry a v = E !a !v deriving Show
data ME a v = NoF | F !(Entry a v) deriving Show
instance Num v => Monoid (ME a v) where
mempty = NoF
NoF `mappend` k = k
k `mappend` NoF = k
(F (E _a1 v1)) `mappend` (F (E a2 v2)) = F (E a2 (v1 + v2))
instance Num v => Measured (ME a v) (Entry a v) where
measure = F
type M a v = FingerTree (ME a v) (Entry a v)
getV NoF = 0
getV (F (E _a v)) = v
expand :: Num v => M a v -> [(a, v)]
expand m = case viewl m of
EmptyL -> []
(E a _v) :< m' -> (a, getV (measure m)) : expand m'
ins :: (Ord a, Num v) => a -> v -> M a v -> (M a v, v)
ins n x m =
let comp (F (E a _)) = n <= a
comp NoF = False
(lo, hi) = split comp m
in case viewl hi of
EmptyL -> (lo |> E n x, 0)
(E v s) :< higher | n < v ->
(lo >< (E n x <| hi), getV (measure hi))
| otherwise ->
(lo >< (E n (s+x) <| higher), getV (measure higher))
g :: (Num t, Ord t, Ord a) =>
(t, (M a t, M a t)) -> a -> (t, (M a t, M a t))
g (c, (a, b)) n =
let (a2, x) = ins n 1 a
(b2, y) = if x>0 then ins n x b else (b, 0)
in (c+y, (a2, b2))
go :: (Ord a, Num v, Ord v) => [a] -> (v, ([(a, v)], [(a, v)]))
go ns = let (t, (a, b)) = foldl g (0, (mempty, mempty)) ns
in (t, (expand a, expand b))
up = [1..6]
down = [5,4..1]
see'tests = map go [ up, down, up ++ down, down ++ up ]
main = putStrLn . unlines . map show $ see'test
Slightly unorthodox, I am adding a second answer using a mutable technique. Since user1308992 mentioned Fenwick trees, I have used them to implement the algorithm. Two STUArray are allocated and mutated during the run. The basic Fenwick tree keeps totals for all smaller indices and the algorithm here needs totals for all larger indices. This change is handled by the (sz-x) subtraction.
import Control.Monad.ST(runST,ST)
import Data.Array.ST(STUArray,newArray)
import Data.Array.Base(unsafeRead, unsafeWrite)
import Data.Bits((.&.))
import Debug.Trace(trace)
import Data.List(group,sort)
{-# INLINE lsb #-}
lsb :: Int -> Int
lsb i = (negate i) .&. i
go :: [Int] -> Int
go xs = compute (maximum xs) xs
-- Require "top == maximum xs" and "all (>=0) xs"
compute :: Int -> [Int] -> Int
compute top xs = runST mutating where
-- Have (sz - (top+1)) > 0 to keep algorithm simple
sz = top + 2
-- Reversed Fenwick tree (no bounds checking)
insert :: STUArray s Int Int -> Int -> Int -> ST s ()
insert arr x v = loop (sz-x) where
loop i | i > sz = return ()
| i <= 0 = error "wtf"
| otherwise = do
oldVal <- unsafeRead arr i
unsafeWrite arr i (oldVal + v)
loop (i + lsb i)
getSum :: STUArray s Int Int -> Int -> ST s Int
getSum arr x = loop (sz - x) 0 where
loop i acc | i <= 0 = return acc
| otherwise = do
val <- unsafeRead arr i
loop (i - lsb i) $! acc + val
ins n x arr = do
insert arr n x
getSum arr (succ n)
mutating :: ST s Int
mutating = do
-- Start index from 0 to make unsafeRead, unsafeWrite easy
a <- newArray (0,sz) 0 :: ST s (STUArray s Int Int)
b <- newArray (0,sz) 0 :: ST s (STUArray s Int Int)
let loop [] c = return c
loop (n:ns) c = do
x <- ins n 1 a
y <- if x > 0
then
ins n x b
else
return 0
loop ns $! c + y
-- Without debugging use the next line
-- loop xs 0
-- With debugging use the next five lines
c <- loop xs 0
a' <- see a
b' <- see b
trace (show (c,(a',b'))) $ do
return c
-- see is only used in debugging
see arr = do
let zs = map head . group . sort $ xs
vs <- sequence [ getSum arr z | z <- zs ]
let ans = filter (\(a,v) -> v>0) (zip zs vs)
return ans
up = [1..6]
down = [5,4..1]
see'tests = map go [ up, down, up ++ down, down ++ up ]
main = putStrLn . unlines . map show $ see'tests

Resources