Pytorch - Getting gradient for intermediate variables / tensors - artificial-intelligence

As an exercice in pytorch framework (0.4.1) , I am trying to display the gradient of X (gX or dSdX) in a simple Linear layer (Z = X.W + B). To simplify my toy example, I backward() from a sum of Z (not a loss).
To sum up, I want gX(dSdX) of S=sum(XW+B).
The problem is that the gradient of Z (dSdZ) is None. As a result, gX is wrong too of course.
import torch
X = torch.tensor([[0.5, 0.3, 2.1], [0.2, 0.1, 1.1]], requires_grad=True)
W = torch.tensor([[2.1, 1.5], [-1.4, 0.5], [0.2, 1.1]])
B = torch.tensor([1.1, -0.3])
Z = torch.nn.functional.linear(X, weight=W.t(), bias=B)
S = torch.sum(Z)
S.backward()
print("Z:\n", Z)
print("gZ:\n", Z.grad)
print("gX:\n", X.grad)
Result:
Z:
tensor([[2.1500, 2.9100],
[1.6000, 1.2600]], grad_fn=<ThAddmmBackward>)
gZ:
None
gX:
tensor([[ 3.6000, -0.9000, 1.3000],
[ 3.6000, -0.9000, 1.3000]])
I have exactly the same result if I use nn.Module as below:
class Net1Linear(torch.nn.Module):
def __init__(self, wi, wo,W,B):
super(Net1Linear, self).__init__()
self.linear1 = torch.nn.Linear(wi, wo)
self.linear1.weight = torch.nn.Parameter(W.t())
self.linear1.bias = torch.nn.Parameter(B)
def forward(self, x):
return self.linear1(x)
net = Net1Linear(3,2,W,B)
Z = net(X)
S = torch.sum(Z)
S.backward()
print("Z:\n", Z)
print("gZ:\n", Z.grad)
print("gX:\n", X.grad)

First of all you only calculate gradients for tensors where you enable the gradient by setting the requires_grad to True.
So your output is just as one would expect. You get the gradient for X.
PyTorch does not save gradients of intermediate results for performance reasons. So you will just get the gradient for those tensors you set requires_grad to True.
However you can use register_hook to extract the intermediate grad during calculation or to save it manually. Here I just save it to the grad variable of tensor Z:
import torch
# function to extract grad
def set_grad(var):
def hook(grad):
var.grad = grad
return hook
X = torch.tensor([[0.5, 0.3, 2.1], [0.2, 0.1, 1.1]], requires_grad=True)
W = torch.tensor([[2.1, 1.5], [-1.4, 0.5], [0.2, 1.1]])
B = torch.tensor([1.1, -0.3])
Z = torch.nn.functional.linear(X, weight=W.t(), bias=B)
# register_hook for Z
Z.register_hook(set_grad(Z))
S = torch.sum(Z)
S.backward()
print("Z:\n", Z)
print("gZ:\n", Z.grad)
print("gX:\n", X.grad)
This will output:
Z:
tensor([[2.1500, 2.9100],
[1.6000, 1.2600]], grad_fn=<ThAddmmBackward>)
gZ:
tensor([[1., 1.],
[1., 1.]])
gX:
tensor([[ 3.6000, -0.9000, 1.3000],
[ 3.6000, -0.9000, 1.3000]])
Hope this helps!
Btw.: Normally you would want the gradient to be activated for your parameters - so your weights and biases. Because what you would do right now when using the optimizer, is altering your inputs X and not your weights W and bias B. So usually gradient is activated for W and B in such a case.

There's a much simpler way. Simply use retain_grad():
https://pytorch.org/docs/stable/autograd.html#torch.Tensor.retain_grad
Z.retain_grad()
before calling backward()

blue-phoenox, thanks for your answer. I am pretty happy to have heard about register_hook().
What led me to think that I had a wrong gX is that it was independant of the values of X. I will have to do the math to understand it. But using CCE Loss instead of SUM makes things much more clean. So I updated the example for those who might be interested. Using SUM was a bad idea in this case.
T_dec = torch.tensor([0, 1])
X = torch.tensor([[0.5, 0.8, 2.1], [0.7, 0.1, 1.1]], requires_grad=True)
W = torch.tensor([[2.7, 0.5], [-1.4, 0.5], [0.2, 1.1]])
B = torch.tensor([1.1, -0.3])
Z = torch.nn.functional.linear(X, weight=W.t(), bias=B)
print("Z:\n", Z)
L = torch.nn.CrossEntropyLoss()(Z,T_dec)
Z.register_hook(lambda gZ: print("gZ:\n",gZ))
L.backward()
print("gX:\n", X.grad)
Result:
Z:
tensor([[1.7500, 2.6600],
[3.0700, 1.3100]], grad_fn=<ThAddmmBackward>)
gZ:
tensor([[-0.3565, 0.3565],
[ 0.4266, -0.4266]])
gX:
tensor([[-0.7843, 0.6774, 0.3209],
[ 0.9385, -0.8105, -0.3839]])

Related

Tensorflow-probability transform event shape of JointDistribution

I would like to create a distribution for n categorical variables C_1,.., C_n whose event shape is n. Using JointDistributionSequentialAutoBatched the event dimension is a list [[],..,[]]. For example for n=2
import tensorflow_probability.python.distributions as tfd
probs = [
[0.8, 0.2], # C_1 in {0,1}
[0.3, 0.3, 0.4] # C_2 in {0,1,2}
]
D = tfd.JointDistributionSequentialAutoBatched([tfd.Categorical(probs=p) for p in probs])
>>> D
<tfp.distributions.JointDistributionSequentialAutoBatched 'JointDistributionSequentialAutoBatched' batch_shape=[] event_shape=[[], []] dtype=[int32, int32]>
How do I reshape it to get event shape [2]?
A few different approaches could work here:
Create a batch of Categorical distributions and then use tfd.Independent to reinterpret the batch dimension as the event:
vector_dist = tfd.Independent(
tfd.Categorical(
probs=[
[0.8, 0.2, 0.0], # C_1 in {0,1}
[0.3, 0.3, 0.4] # C_2 in {0,1,2}
]),
reinterpreted_batch_ndims=1)
Here I added an extra zero to pad out probs so that both distributions can be represented by a single Categorical object.
Use the Blockwise distribution, which stuffs its component distributions into a single vector (as opposed to the JointDistribution classes, which return them as separate values):
vector_dist = tfd.Blockwise([tfd.Categorical(probs=p) for p in probs])
The closest to a direct answer to your question is to apply the Split bijector, whose inverse is Concat, to the joint distribution:
tfb = tfp.bijectors
D = tfd.JointDistributionSequentialAutoBatched(
[tfd.Categorical(probs=[p] for p in probs])
vector_dist = tfb.Invert(tfb.Split(2))(D)
Note that I had to awkwardly write probs=[p] instead of just probs=p. This is because the Concat bijector, like tf.concat, can't change the tensor rank of its argument---it can concatenate small vectors into a big vector, but not scalars into a vector---so we have to ensure that its inputs are themselves vectors. This could be avoided if TFP had a Stack bijector analogous to tf.stack / tf.unstack (it doesn't currently, but there's no reason this couldn't exist).

Python 3.7: Modelling a 2D Gaussian equation using a Numpy meshgrid and arrays without iterating through each point

I am currently trying to write my own 2D Gaussian function as a coding exercise, and have been able to create the following script:
import numpy as np
import matplotlib.pyplot as plt
def Gaussian2D_v1(coords=None, # x and y coordinates for each image.
amplitude=1, # Highest intensity in image.
xo=0, # x-coordinate of peak centre.
yo=0, # y-coordinate of peak centre.
sigma_x=1, # Standard deviation in x.
sigma_y=1, # Standard deviation in y.
rho=0, # Correlation coefficient.
offset=0): # Offset from zero (background radiation).
x, y = coords
xo = float(xo)
yo = float(yo)
# Create covariance matrix
mat_cov = [[sigma_x**2, rho * sigma_x * sigma_y],
[rho * sigma_x * sigma_y, sigma_y**2]]
mat_cov = np.asarray(mat_cov)
# Find its inverse
mat_cov_inv = np.linalg.inv(mat_cov)
G_array = []
# Calculate pixel by pixel
# Iterate through row last
for i in range(0, np.shape(y)[0]):
# Iterate through column first
for j in range(0, np.shape(x)[1]):
mat_coords = np.asarray([[x[i, j]-xo],
[y[i, j]-xo]])
G = (amplitude * np.exp(-0.5*np.matmul(np.matmul(mat_coords.T,
mat_cov_inv),
mat_coords)) + offset)
G_array.append(G)
G_array = np.asarray(G_array)
G_array = G_array.reshape(64, 64)
return G_array.ravel()
coords = np.meshgrid(np.arange(0, 64), np.arange(0, 64))
model_1 = Gaussian2D_v1(coords,
amplitude=20,
xo=32,
yo=32,
sigma_x=6,
sigma_y=3,
rho=0.8,
offset=20).reshape(64, 64)
plt.figure(figsize=(5, 5)).add_axes([0,
0,
1,
1])
plt.contourf(model_1)
The code as it is works, but as you can see, I am currently iterating through the mesh grid one point at a time, and appending each point to a list, which is then converted to an array and re-shaped to give the 2D Gaussian distribution.
How can I modify the script to forgo using a nested "for" loop and have the program consider the whole meshgrid for matrix calculations? Is such a method possible?
Thanks!
Of course there is a solution, numpy is all about array operations and vectorization of the code! np.matmul can take args with more than 2 dimensions and apply the matrix multiplication on the last two axes only (and this calculation in parallel over the others axes). However, making sure of the right axes order can get tricky.
Here is your edited code:
import numpy as np
import matplotlib.pyplot as plt
def Gaussian2D_v1(coords, # x and y coordinates for each image.
amplitude=1, # Highest intensity in image.
xo=0, # x-coordinate of peak centre.
yo=0, # y-coordinate of peak centre.
sigma_x=1, # Standard deviation in x.
sigma_y=1, # Standard deviation in y.
rho=0, # Correlation coefficient.
offset=0): # Offset from zero (background radiation).
x, y = coords
xo = float(xo)
yo = float(yo)
# Create covariance matrix
mat_cov = [[sigma_x**2, rho * sigma_x * sigma_y],
[rho * sigma_x * sigma_y, sigma_y**2]]
mat_cov = np.asarray(mat_cov)
# Find its inverse
mat_cov_inv = np.linalg.inv(mat_cov)
# PB We stack the coordinates along the last axis
mat_coords = np.stack((x - xo, y - yo), axis=-1)
G = amplitude * np.exp(-0.5*np.matmul(np.matmul(mat_coords[:, :, np.newaxis, :],
mat_cov_inv),
mat_coords[..., np.newaxis])) + offset
return G.squeeze()
coords = np.meshgrid(np.arange(0, 64), np.arange(0, 64))
model_1 = Gaussian2D_v1(coords,
amplitude=20,
xo=32,
yo=32,
sigma_x=6,
sigma_y=3,
rho=0.8,
offset=20)
plt.figure(figsize=(5, 5)).add_axes([0, 0, 1, 1])
plt.contourf(model_1)
So, the equation is exp(-0.5 * (X - µ)' Cinv (X - µ) ), where X is our coordinate matrix, µ the mean (x0, y0) and Cinv the inverse covariance matrix (and ' is a transpose). In the code, I stack both meshgrids to a new matrix so that: mat_coords has a shape of (Ny, Nx, 2). In the first np.matmul call, I add a new axis so that the shapes go like :(Ny, Nx, 1, 2) * (2, 2) = (Ny, Nx, 1, 2). As you see, the matrix multiplication is done on the two last axes, in parallel on the other. Then, I add a new axis so that: (Ny, Nx, 1, 2) * (Ny, Nx, 2, 1) = (Ny, Nx, 1, 1).
The np.squeeze() call returns a version without the two last singleton axes.

When to use transposition for plotting contour in Julia

So I tried to plot a contour in Julia by interpolating a 2D function, using the following code:
using Interpolations
using Plots
gr()
xs = 1:0.5:5
ys = 1:0.5:8
# The function to be plotted
f(x, y) = (3x + y ^ 2)
g = Float64[f(x,y) for x in xs, y in ys]
# Interpolate the function
g_int = interpolate(g, BSpline(Quadratic(Line(OnCell()))))
# Scale the interpolated function to the correct grid
gs_int = scale(g_int, xs, ys)
xc = 1:0.1:5
yc = 1:0.1:5
# Compare the real value and the interpolated value of the function at an arbitrary point
println("gs_int(3.2, 3.2) = ", gs_int(3.2, 3.2))
println("f(3.2, 3.2) = ", f(3.2, 3.2))
# Contour of the interpolated plot
p1 = contour(xs, ys, gs_int(xs, ys), fill=true)
# Real contour of the function
p2 = contour(xc, yc, f, fill=true)
plot(p1, p2)
And this obviously didn't give the correct contour, although the interpolation was seemingly correct:
The problem was fixed by transposing gs_int(xs, ys):
p1 = contour(xs, ys, gs_int(xs, ys)', fill=true)
Then I randomly generated some points in 2D space, and repeated the same procedures:
using DelimitedFiles
using Interpolations
using Plots
gr()
data = readdlm("./random_points.txt", Float64)
# Create a dictionary to test different orders of interpolations.
inter = Dict("constant" => BSpline(Constant()),
"linear" => BSpline(Linear()),
"quadratic" => BSpline(Quadratic(Line(OnCell()))),
"cubic" => BSpline(Cubic(Line(OnCell())))
)
x = range(-10, length=64, stop=10)
y = range(-10, length=64, stop=10)
v_unscaled = interpolate(data, inter["cubic"])
v = scale(v_unscaled, x, y)
# The contour of the data points
p0 = contour(x, y, data, fill=true)
display(p0)
# The contour of the interpolated function
p_int = contour(x, y, v(x,y)', fill=true)
display(p_int)
However the two contour plots don't look the same.
As I removed the apostrophe after v(x,y), this worked:
p_int = contour(x, y, v(x,y), fill=true)
Now I don't get it. When should I apply transposition, and when shouldn't I do so?
That's because in your first example you plot a function, in the second example you plot two arrays. The two arrays don't need to be transposed as they are oriented the same way. But in the first example, the way you generate the array is transposed relative to the way Plots generates an array from the 2-d function you're passing.
When you plot a function, Plots will calculate the outcome as g = Float64[f(x,y) for y in ys, x in xs] not the other way around, like you did in your code. For a good discussion of transposes in plotting, again refer to https://github.com/JuliaPlots/Makie.jl/issues/205

Backspin effect in pool game with SceneKit

I would like to create a realistic pool game and to implement at least some basic ball effects. I started from scratch with SceneKit and at this point I'm just studying the proper technology to go with it.SceneKit would be the ideal.
I managed to achieve an acceptable ball effect for sidespin and some sort of forward spin. The one I'm struggle with is backspin. I'm playing around with the position parameter of applyForce method, but it seems that alone will not give me the result I'm looking for. Either I'm missing something (I've got limited knowledge of physics) or SceneKit's physics simulation is just not enough for what I want. Basically I have a sphere of 1.5 radius and I went from -1.5 to 1.5 on Y component for the position vector and the result is either the white ball or the ball I'm hitting jumps when collision occurs.
The first screenshot shows the moment of impact whilst the latter shows after the collision and how it jumps.
The two spheres are configured like this
let sphereGeometry = SCNSphere(radius: 1.5)
sphere1 = SCNNode(geometry: sphereGeometry)
sphere1.position = SCNVector3(x: -15, y: 0, z: 0)
sphere2 = SCNNode(geometry: sphereGeometry)
sphere2.position = SCNVector3(x: 15, y: 0, z: 0)
And the code that gives me that effect is the following:
sphere1.physicsBody?.applyForce(SCNVector3Make(350, 0, 0), atPosition:SCNVector3Make(1.5, -0.25, 0), impulse: true)
What I'm trying to do in that code is to hit the ball roughly a bit below the center. How I got to -0.25 was to get an angle of 10 degrees and calculate its sin function. Then I multiplied it by sphere radius so I can get a point that lies right on the sphere's surface.
So I've been reading several papers/chapters about pool physics and I think I found something that at least proves me I can do it with SceneKit. So what I was missing was i. right formulae ii. angular velocity. The physics still need a lot of polish but at least it seems to get roughly the trajectory one would expect when applying these effects. Here's the code in case anyone's interested in:
//Cue strength
let strength : Float = 1000
//Cue mass expressed in terms of ball's mass
let cueMass : Float = self.balls[0].mass * 1.25
//White ball
let whiteBall = self.balls[0]
//The ball we are trying to hit
let targetBall = self.balls[1]
//White ball radius
let ballRadius = whiteBall.radius
//This should be in the range of {-R, R} where R is the ball radius. It determines how much off the center we would like to hit the ball along the z-axis. Produces left/right spin
let a : Float = 0
//This should be in the range of {-R, R} where R is the ball radius. It determines how much off the center we would like to hit the ball along the y-axis. Produces top/back spin
let b : Float = -ballRadius * 0.7
//This is calculated based off a and b and it is the position that we will be hitting the ball along the x-axis.
let c : Float = sqrt(ballRadius * ballRadius - a * a - b * b)
//This is the angle of the cue expressed in degrees. Values greater than zero will produce jump shots
let cueAngle : Float = 0
//Cue angle in radians for math functions
let cueAngleInRadians : Float = (cueAngle * 3.14) / 180
let cosAngle = cos(cueAngleInRadians)
let sinAngle = sin(cueAngleInRadians)
//Values to calculate the magnitude to be applied given the above variables
let m0 = a * a
let m1 = b * b * cosAngle * cosAngle
let m2 = c * c * sinAngle * sinAngle
let m3 = 2 * b * c * cosAngle * sinAngle
let w = (5 / (2 * ballRadius * ballRadius)) * (m0 + m1 + m2 + m3)
let n = 2 * whiteBall.mass * strength
let magnitude = n / (1 + whiteBall.mass / cueMass + w)
//We would like to point to the target ball
let targetVector = targetBall.position
//Get the unit vector of our target
var target = (targetVector - whiteBall.position).normal
//Multiply our direction by the force's magnitude. Y-axis component reflects the angle of the cue
target.x *= magnitude
target.y = (magnitude / whiteBall.mass) * sinAngle
target.z *= magnitude
//Apply the impulse at the given position by c, b, a
whiteBall.physicsBody?.applyForce(target, atPosition: SCNVector3Make(c, b, a), impulse: true)
//Values to calculate angular force
let i = ((2 / 5) * whiteBall.mass * ballRadius * ballRadius)
let wx = a * magnitude * sinAngle
let wy = -a * magnitude * cosAngle
let wz = -c * magnitude * sinAngle + b * magnitude * cosAngle
let wv = SCNVector3Make(wx, wy, wz) * (1 / i)
//Apply a torque
whiteBall.physicsBody?.applyTorque(SCNVector4Make(wv.x, wv.y, wv.z, 0.4), impulse: true)
Note that values of a, b, c should take into account the target vector's direction.

Mathematica - plotting an array with values and exponential function

I have a question about mathematica. I have an array with values called tempDep:
{10.7072,11.5416,12.2065,12.774,13.2768,13.7328,14.1526,14.5436,14.9107,15.2577,15.5874,15.9022,16.2037,16.4934,16.7727,17.0425,17.3036,17.5569,17.803,18.0424,18.2756,18.503,18.725,18.9419,19.154,19.3615,19.5647,19.7637,19.9588,20.1501,20.3378,20.5219,20.7025,20.8799,21.0541,21.2252,21.3933,21.5584,21.7207,21.8801,22.0368,22.1908,22.3423,22.4911,22.6374,22.7813,22.9228,23.0619,23.1987,23.3332,23.4655,23.5955,23.7235,23.8493,23.973,24.0947,24.2143,24.332,24.4478,24.5616,24.6736,24.7837,24.892,24.9986,25.1034,25.2064,25.3078,25.4075,25.5055,25.602,25.6968,25.7901,25.8819,25.9722,26.061,26.1483,26.2342,26.3186,26.4017,26.4835,26.5638,26.6429,26.7207,26.7972,26.8724,26.9464,27.0192,27.0908,27.1612,27.2304,27.2986,27.3656,27.4315,27.4963,27.56,27.6227,27.6844,27.7451,27.8048,27.8635,27.9212,27.978,28.0338,28.0887,28.1428,28.1959,28.2482,28.2996,28.3502,28.3999,28.4488,28.497,28.5443,28.5908,28.6366,28.6817,28.726,28.7695,28.8124,28.8545,28.896,28.9368,28.9769,29.0163,29.0551,29.0933,29.1308,29.1678,29.2041,29.2398,29.2749,29.3095,29.3435,29.3769,29.4098,29.4421,29.474,29.5053,29.536,29.5663,29.5961,29.6254,29.6542,29.6825,29.7104,29.7378,29.7647,29.7913,29.8173,29.843,29.8682,29.893,29.9175,29.9415,29.9651,29.9883,30.0112,30.0336,30.0557,30.0775,30.0989,30.1199,30.1406,30.1609,30.1809,30.2006,30.22,30.239,30.2578,30.2762,30.2943,30.3121,30.3297,30.3469,30.3639,30.3806,30.397,30.4131,30.429,30.4446,30.4599,30.4751,30.4899,30.5045,30.5189,30.533,30.5469,30.5606,30.5741,30.5873,30.6003,30.6131,30.6257,30.6381,30.6503,30.6623,30.674,30.6856}
and I am plotting it using
ListPlot[tempDep]
What I want to do is to display this plot together with an exponential (that should look pretty much the same as this listPlot) in one graph. Can u help me out with this plz?
Perhaps something like this?
data = Table[Sin[x], {x, 0, 2 Pi, 0.3}];
Show[
ListPlot[data, PlotStyle -> PointSize[0.02]],
ListLinePlot[data,
InterpolationOrder -> 2,
PlotStyle -> Directive[Thick, Orange]]
]
You can use
f = Interpolate[tempDep]
and then plot the graph of interpolating function with
Plot[f,{x,1,198}]
It seems to me that your data obey something else, but if you want an exponential fit:
model = a + b Exp[c + d x];
tempDep1 = Partition[Riffle[Range#Length#tempDep, tempDep], 2];
fit = FindFit[tempDep1, model, {a, b, c, d}, x, Method -> NMinimize];
modelf = Function[{x}, Evaluate[model /. fit]]
Plot[modelf[t], {t, 0, Length#tempDep}, Epilog -> Point#tempDep1]

Resources