Array division comparison between Matlab and Julia [duplicate] - arrays

A \ B in matlab gives a special solution while numpy.linalg.lstsq doesn't.
A = [1 2 0; 0 4 3];
b = [8; 18];
c_mldivide = A \ b
c_mldivide =
0
4
0.66666666666667
c_lstsq = np.linalg.lstsq([[1 ,2, 0],[0, 4, 3]],[[8],[18]])
print c_lstsq
c_lstsq = (array([[ 0.91803279],
[ 3.54098361],
[ 1.27868852]]), array([], dtype=float64), 2, array([ 5.27316304,1.48113184]))
How does mldivide A \ B in matlab give a special solution?
Is this solution usefull in achieving computational accuracy?
Why is this solution special and how might you implement it in numpy?

For under-determined systems such as yours (rank is less than the number of variables), mldivide returns a solution with as many zero values as possible. Which of the variables will be set to zero is up to its arbitrary choice.
In contrast, the lstsq method returns the solution of minimal norm in such cases: that is, among the infinite family of exact solutions it will pick the one that has the smallest sum of squares of the variables.
So, the "special" solution of Matlab is somewhat arbitrary: one can set any of the three variables to zero in this problem. The solution given by NumPy is in fact more special: there is a unique minimal-norm solution
Which solution is better for your purpose depends on what your purpose is. The non-uniqueness of solution is usually a reason to rethink your approach to the equations. But since you asked, here is NumPy code that produces Matlab-type solutions.
import numpy as np
from itertools import combinations
A = np.matrix([[1 ,2, 0],[0, 4, 3]])
b = np.matrix([[8],[18]])
num_vars = A.shape[1]
rank = np.linalg.matrix_rank(A)
if rank == num_vars:
sol = np.linalg.lstsq(A, b)[0] # not under-determined
else:
for nz in combinations(range(num_vars), rank): # the variables not set to zero
try:
sol = np.zeros((num_vars, 1))
sol[nz, :] = np.asarray(np.linalg.solve(A[:, nz], b))
print(sol)
except np.linalg.LinAlgError:
pass # picked bad variables, can't solve
For your example it outputs three "special" solutions, the last of which is what Matlab chooses.
[[-1. ]
[ 4.5]
[ 0. ]]
[[ 8.]
[ 0.]
[ 6.]]
[[ 0. ]
[ 4. ]
[ 0.66666667]]

Related

Most computationally efficient way to batch alter values in each array of a 2d array, based on conditions for particular values by indices

Say that I have a batch of arrays, and I would like to alter them based on conditions of particular values located by indices.
For example, say that I would like to increase and decrease particular values if the difference between those values are less than two.
For a single 1D array it can be done like this
import numpy as np
single2 = np.array([8, 8, 9, 10])
if abs(single2[1]-single2[2])<2:
single2[1] = single2[1] - 1
single2[2] = single2[2] + 1
single2
array([ 8, 7, 10, 10])
But I do not know how to do it for batch of arrays. This is my initial attempt
import numpy as np
single1 = np.array([6, 0, 3, 7])
single2 = np.array([8, 8, 9, 10])
single3 = np.array([2, 15, 15, 20])
batch = np.array([
np.copy(single1),
np.copy(single2),
np.copy(single3),
])
if abs(batch[:,1]-batch[:,2])<2:
batch[:,1] = batch[:,1] - 1
batch[:,2] = batch[:,2] + 1
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Looking at np.any and np.all, they are used to create an array of booleans values, and I am not sure how they could be used in the code snippet above.
My second attempt uses np.where, using the method described here for comparing particular values of a batch of arrays by creating new versions of the arrays with values added to the front/back of the arrays.
https://stackoverflow.com/a/71297663/3259896
In the case of the example, I am comparing values that are right next to each other, so I created copies that shift the arrays forwards and backwards by 1. I also use only the particular slice of the array that I am comparing, since the other numbers would also be used in the comparison in np.where.
batch_ap = np.concatenate(
(batch[:, 1:2+1], np.repeat(-999, 3).reshape(3,1)),
axis=1
)
batch_pr = np.concatenate(
(np.repeat(-999, 3).reshape(3,1), batch[:, 1:2+1]),
axis=1
)
Finally, I do the comparisons, and adjust the values
batch[:, 1:2+1] = np.where(
abs(batch_ap[:,1:]-batch_ap[:,:-1])<2,
batch[:, 1:2+1]-1,
batch[:, 1:2+1]
)
batch[:, 1:2+1] = np.where(
abs(batch_pr[:,1:]-batch_pr[:,:-1])<2,
batch[:, 1:2+1]+1,
batch[:, 1:2+1]
)
print(batch)
[[ 6 0 3 7]
[ 8 7 10 10]
[ 2 14 16 20]]
Though I am not sure if this is the most computationally efficient nor programmatically elegant method for this task. Seems like a lot of operations and code for the task, but I do not have a strong enough mastery of numpy to be certain about this.
This works
mask = abs(batch[:,1]-batch[:,2])<2
batch[mask,1] -= 1
batch[mask,2] += 1

General multiplication in numpy for scalars and arrays possible?

For a general code that should be able to take scalars and arrays, I would like "#" to work with scalars, e.g.
a = 4
b = 3
as well as for arrays
a = np.array([[1, 2],[3, 4]])
b = np.array([1, 2])
The goal would that I can use for both cases
a#b
Does anyone know if this is possible?
Update 2021-06-17
#hpaulj thanks for the response.
I updated above to hopefully clarify. Currently I have a if-query to check if scalar or array and separate calculations with * and #, respectively. I was thinking it would simplify things if there was just one operation.
Maybe a stupid follow up:
Would it make sense to extend #/np.matmul to be able to do the scalar multiplication? Or is there a reason that would make this a bad idea?
It looks like numpy.dot is the most general code available for multiplications.
# is numpy.matmult and not numpy.dot, which are indeed not the same (see https://numpy.org/doc/stable/reference/generated/numpy.matmul.html):
matmul differs from dot in two important ways:
Multiplication by scalars is not allowed, use * instead.
Stacks of matrices are broadcast together as if the matrices were elements, respecting the signature (n,k),(k,m)->(n,m):
I still have to see the consequence of this aspect for my code.
Here is a simple code to show what is going on:
import numpy as np
# with matrix (only up to 2D)
a = np.matrix([[1, 2], [3, 4]])
b = np.matrix([[5, 6]]).T
c1 = a*b
c2 = a#b
c3 = np.dot(a, b)
# with array (general)
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6]]).T
C1 = A*B
C2 = A#B
C3 = np.dot(A, B)
# scalar
aS = 2
bS = 3
cS1 = aS*bS
#cS2 = aS#bS # does not work
cS3 = np.dot(aS, bS)

How to remove all occurrences of an element from NumPy array? [duplicate]

This question already has answers here:
Deleting certain elements from numpy array using conditional checks
(3 answers)
Remove all occurrences of a value from a list?
(26 answers)
Closed 4 years ago.
The title is pretty self-explanatory: I have an numpy array like (let's say ints)
[ 1 2 10 2 12 2 ] and I would like to remove all occurrences of 2, so that the resulting array is [ 1 10 12 ]. Preferably I would like to do this as fastest as possible, because I am using relatively large arrays.
NumPy has a function called numpy.delete() but it takes the indexes as an argument, which I do not have.
Edit: The question is indeed different from Deleting certain elements from numpy array using conditional checks, which is I guess a more "general" case. However, the idea of removing occurrences from an array is fundamental enough to merit its own explicit question, so I am keeping the question.
You can use indexing:
arr = np.array([1, 2, 10, 2, 12, 2])
print(arr[arr != 2])
# [ 1 10 12]
Timing is pretty good:
from timeit import Timer
arr = np.array(range(5000))
print(min(Timer(lambda: arr[arr != 4999]).repeat(500, 500)))
# 0.004942436999999522
you can use another numpy function.It is numpy.setdiff1d(ar1, ar2, assume_unique=False).
This function Finds the set difference of two arrays.
import numpy as np
a = np.array([1, 2, 10, 2,12, 2])
b = np.array([2])
c = np.setdiff1d(a,b,True)
print(c)
There are several ways to do this. I suggest you use a mask:
import numpy as np
a = np.array([ 1, 2 ,10, 2, 12, 2 ])
a[~np.isin(a, 2)]
>> array([ 1, 10, 12])
np.isin is convenient because you can apply the filter to multiple elements at once if you need to:
a[~np.isin(a, (1,2))]
>> array([ 10, 12])
Also note that a[mask] is a slice of the original array. This is memory efficient; but if you need to create a new array with your filtered values and leave the original ones untouched, use .copy, e.g.:
b = a[~np.isin(a, (1,2))].copy()

Why does deepcopy change values of numpy array?

I am having a problem in which values in a Numpy array change after copying it with copy.deepcopy or numpy.copy, in fact, I get different values if I just print the array first before copying it.
I am using Python 3.5, Numpy 1.11.1, Scipy 0.18.0
My starting array is contained in a list of tuples; each tuple is pair: a float (a time point) and a numpy array (the solution of an ODE at that time point), e.g.:
[(0.0, array([ 0., ... 0.])), ...
(3.0, array([ 0., ... 0.]))]
In this case, I want the array for the last time point.
When I call the following:
tandy = c1.IntegrateColony(3)
ylast = copy.deepcopy(tandy[-1][1])
print(ylast)
I get something that makes sense for the system I'm trying to simulate:
[7.14923891e-07 7.14923891e-07 ... 8.26478813e-01 8.85589634e-01]
However, with the following:
tandy = c1.IntegrateColony(3)
print(tandy[-1][1])
ylast = copy.deepcopy(tandy[-1][1])
print(ylast)
I get all zeros:
[0.00000000e+00 0.00000000e+00 ... 0.00000000e+00 0.00000000e+00]
[ 0. 0. ... 0. 0.]
I should add, with larger systems and different parameters, displaying tandy[k][1] (either with print() or just by calling it in the command line) shows all non-zero values that are all very close to zero, i.e. <1e-70, but that's still not sensible for the system.
With:
tandy = c1.IntegrateColony(3)
ylast = np.copy(tandy[-1][1])
print(ylast)
I get sensible output again:
[7.14923891e-07 7.14923891e-07 ... 8.26478813e-01 8.85589634e-01]
The function that generates 'tandy' is the following (edited for clarity), which uses scipy.integrate.ode, and the set_solout method to get the solution at intermediate time points:
def IntegrateColony(self, tmax=1):
# I edited out initialization of dCdt & first_step for clarity.
y = ode(dCdt)
y.set_integrator('dopri5', first_step=dt0, nsteps=2000)
sol = []
def solout(tcurrent, ytcurrent):
sol.append((tcurrent, ytcurrent))
y.set_solout(solout)
y.set_initial_value(y=C0, t=0)
yfinal = y.integrate(tmax)
return sol
Although I could get the last time point by returning yfinal, I'd like to get the whole time course once I figure out why it's behaving the way it is.
Thanks for your suggestions!
Mickey
Edit:
If I print all of sol (print(tandy) or print(IntegrateColony...), it comes out as shown above (with the values in the arrays as 0), i.e.:
[(0.0, array([ 0., ... 0.])), ...
(3.0, array([ 0., ... 0.]))]
However, if I copy it with (y = copy.deepcopy(tandy); print(y)), the arrays take on values between 1e-7 and 1e+1.
If I do print(tandy[-1][1]) twice in a row, they're filled with zeros, but the format changes (from 0.0000 to 0.).
One other feature I noticed while following the suggestions in LutzL's and hpaulj's comments: if I run tandy = c1.IntegrateColony(3) in the console (running Spyder), the arrays are filled with zeros in the variable explorer. However, if I run the following in the console:
tandy = c1.IntegrateColony(3); ylast=copy.deepcopy(tandy)
Both the arrays in tandy and in ylast are filled with values in the range I would expect, and print(tandy[-1][1]) now gives:
[7.14923891e-07 7.14923891e-07 ... 8.26478813e-01 8.85589634e-01]
Even if I find a solution that stops this behavior, I'd appreciate anyone's insight about what's going on so I don't make the same mistakes again.
Thanks!
Edit:
Here's a simple case that gives this behavior:
import numpy as np
from scipy.integrate import ode
def testODEint(tmax=1):
C0 = np.ones((3,))
# C0 = 1 # This seems to behave the same
def dCdt_simpleinputs(t, C):
return C
y = ode(dCdt_simpleinputs)
y.set_integrator('dopri5')
sol = []
def solout(tcurrent, ytcurrent):
sol.append((tcurrent, ytcurrent)) # Behaves oddly
# sol.append((tcurrent, ytcurrent.copy())) # LutzL's idea: Works
y.set_solout(solout)
y.set_initial_value(y=C0, t=0)
yfinal = y.integrate(tmax)
return sol
tandy = testODEint(1)
ylast = np.copy(tandy[-1][1])
print(ylast) # Expect same values as tandy[-1][1] below
tandy = testODEint(1)
tandy[-1][1]
print(tandy[-1][1]) # Expect same values as ylast above
When I run this, I get the following output for ylast and tandy[-1][1]:
[ 2.71828196 2.71828196 2.71828196]
[ 0.00000000e+00 0.00000000e+00 0.00000000e+00]
The code I was working on when I ran into this problem is an embarrassing mess, but if you want to take a look, an old version is here: https://github.com/mvondassow/BryozoanModel2
The details of why this is happening are tied to how ytcurrent is handled in integrate. But there are various contexts in Python where all values of a list end up the same - contrary to expectations.
For example:
In [159]: x
Out[159]: [0, 1, 2]
In [160]: x=[]
In [161]: y=np.array([1,2,3])
In [162]: for i in range(3):
...: y += i
...: x.append(y)
In [163]: x
Out[163]: [array([4, 5, 6]), array([4, 5, 6]), array([4, 5, 6])]
All elements of x have the same value - because they all are pointers to the same y, and thus show its final value.
but if I copy y before appending it to the list, I see the changes.
In [164]: x=[]
In [165]: for i in range(3):
...: y += i
...: x.append(y.copy())
In [166]: x
Out[166]: [array([4, 5, 6]), array([5, 6, 7]), array([7, 8, 9])]
In [167]:
Now that does not explain why the print statement changes the values. But that whole solout callback mechanism is a bit obscure. I wonder if there are any warnings in scipy about pitfalls in defining such a callback?

Julia Approach to python equivalent list of lists

I just started tinkering with Julia and I'm really getting to like it. However, I am running into a road block. For example, in Python (although not very efficient or pythonic), I would create an empty list and append a list of a known size and type, and then convert to a NumPy array:
Python Snippet
a = []
for ....
a.append([1.,2.,3.,4.])
b = numpy.array(a)
I want to be able to do something similar in Julia, but I can't seem to figure it out. This is what I have so far:
Julia snippet
a = Array{Float64}[]
for .....
push!(a,[1.,2.,3.,4.])
end
The result is an n-element Array{Array{Float64,N},1} of size (n,), but I would like it to be an nx4 Array{Float64,2}.
Any suggestions or better way of doing this?
The literal translation of your code would be
# Building up as rows
a = [1. 2. 3. 4.]
for i in 1:3
a = vcat(a, [1. 2. 3. 4.])
end
# Building up as columns
b = [1.,2.,3.,4.]
for i in 1:3
b = hcat(b, [1.,2.,3.,4.])
end
But this isn't a natural pattern in Julia, you'd do something like
A = zeros(4,4)
for i in 1:4, j in 1:4
A[i,j] = j
end
or even
A = Float64[j for i in 1:4, j in 1:4]
Basically allocating all the memory at once.
Does this do what you want?
julia> a = Array{Float64}[]
0-element Array{Array{Float64,N},1}
julia> for i=1:3
push!(a,[1.,2.,3.,4.])
end
julia> a
3-element Array{Array{Float64,N},1}:
[1.0,2.0,3.0,4.0]
[1.0,2.0,3.0,4.0]
[1.0,2.0,3.0,4.0]
julia> b = hcat(a...)'
3x4 Array{Float64,2}:
1.0 2.0 3.0 4.0
1.0 2.0 3.0 4.0
1.0 2.0 3.0 4.0
It seems to match the python output:
In [9]: a = []
In [10]: for i in range(3):
a.append([1, 2, 3, 4])
....:
In [11]: b = numpy.array(a); b
Out[11]:
array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
I should add that this is probably not what you actually want to be doing as the hcat(a...)' can be expensive if a has many elements. Is there a reason not to use a 2d array from the beginning? Perhaps more context to the question (i.e. the code you are actually trying to write) would help.
The other answers don't work if the number of loop iterations isn't known in advance, or assume that the underlying arrays being merged are one-dimensional. It seems Julia lacks a built-in function for "take this list of N-D arrays and return me a new (N+1)-D array".
Julia requires a different concatenation solution depending on the dimension of the underlying data. So, for example, if the underlying elements of a are vectors, one can use hcat(a) or cat(a,dims=2). But, if a is e.g a 2D array, one must use cat(a,dims=3), etc. The dims argument to cat is not optional, and there is no default value to indicate "the last dimension".
Here is a helper function that mimics the np.array functionality for this use case. (I called it collapse instead of array, because it doesn't behave quite the same way as np.array)
function collapse(x)
return cat(x...,dims=length(size(x[1]))+1)
end
One would use this as
a = []
for ...
... compute new_a...
push!(a,new_a)
end
a = collapse(a)

Resources