MDP calculation - artificial-intelligence

MDP calculation - artificial-intelligence

How does the below calculation work?

When you are in state S_{n-2}, the optimal actions are
[a0, a0, {a0|a1}, {a0|a1}, {a0|a1}, ...]
which will give you this reward sequence:
[0.0, 0.0, 1.0, 1.0, 1.0, ...]
To get the optimal value in S_{n-2} you just have to discount the optimal rewards with γ:
γ^0*0.0 + γ^1*0.0 + γ^2*1.0 + γ^3*1.0 + γ^4*1.0 + ...
= γ^2 * (1.0 + γ + γ^2 + ...)
= γ^2 * V(G)
You get zero intermediate rewards before you reach the goal step. So this is equivalent to discounting the value of G by two time steps.

Related

Can I use a searching algorithm between arrays?

I have produced a plot of two arrays of the same size, the x axis being a time array with varying time steps and the y axis being a pre-calculated array of values. This is the plot below:
Up until this point, I have been searching for the time for when delta = 0 (apart from when time=0) . To do this, I have been manually going into the array to find at what step it reaches this value (i.e. delta = 0 at step 3,000), then going into the time-value array and looking for the step with the same value (i.e. searching for the 3,000th step and retrieving the time).
Is there a way I can automate this and have it as an output rather than manually searching each time?
The base code is below:
import numpy as np
import matplotlib.pyplot as plt
from scipy import integrate
masses = [1, 1, 1]
r1, v1 = [0, 0], [-2*0.513938054919243, -2*0.304736003875733]
r2, v2 = [-1, 0], [0.513938054919243, 0.304736003875733]
r3, v3 = [1, 0], [0.513938054919243, 0.304736003875733]
u0 = np.concatenate([r1, v1, r2, v2, r3, v3])
def odesys(t, u):
def force(a): return a / sum(a ** 2) ** 1.5
r1, v1, r2, v2, r3, v3 = u.reshape([-1, 2])
m1, m2, m3 = masses
f12, f13, f23 = force(r1 - r2), force(r1 - r3), force(r2 - r3)
a1, a2, a3 = -m2 * f12 - m3 * f13, m1 * f12 - m3 * f23, m1 * f13 + m2 * f23
return np.concatenate([v1, a1, v2, a2, v3, a3])
# collect data
t_values = []
u_values = []
par_1_pos = []
d_values = []
# Time start, step, and finish point
t0, tf, t_step = 0, 18, 0.0001
nsteps = int((tf - t0) / t_step)
solution = integrate.RK45(odesys, t0, u0, tf, max_step=t_step)
# The loop for running the Runge-Kutta method over some time period.
u_values.append(solution.y)
t_values.append(t0)
par_1_pos.append(((solution.y[0] - u0[0])**2 + (solution.y[1] - u0[1])**2)**0.5)
d_values.append(((solution.y[0] - u0[0])**2 + (solution.y[1] - u0[1])**2)**0.5 +
((solution.y[4] - u0[4])**2 + (solution.y[5] - u0[5])**2)**0.5 +
((solution.y[8] - u0[8])**2 + (solution.y[9] - u0[9])**2)**0.5 +
((solution.y[2] - u0[2])**2 + (solution.y[3] - u0[3])**2)**0.5 +
((solution.y[6] - u0[6])**2 + (solution.y[7] - u0[7])**2)**0.5 +
((solution.y[10] - u0[10])**2 + (solution.y[11] - u0[11])**2)**0.5)
for step in range(nsteps):
solution.step()
u_values.append(solution.y)
t_values.append(solution.t)
par_1_pos.append(((solution.y[0] - u0[0])**2 + (solution.y[1] - u0[1])**2)**0.5)
d_values.append(((solution.y[0] - u0[0])**2 + (solution.y[1] - u0[1])**2)**0.5 +
((solution.y[4] - u0[4])**2 + (solution.y[5] - u0[5])**2)**0.5 +
((solution.y[8] - u0[8])**2 + (solution.y[9] - u0[9])**2)**0.5 +
((solution.y[2] - u0[2])**2 + (solution.y[3] - u0[3])**2)**0.5 +
((solution.y[6] - u0[6])**2 + (solution.y[7] - u0[7])**2)**0.5 +
((solution.y[10] - u0[10])**2 + (solution.y[11] - u0[11])**2)**0.5)
# break loop after modelling is finished
if solution.status == 'finished':
break
# Plotting of the individual particles
u = np.asarray(u_values).T
# Plot for The trajectory of the three bodies over the time period
plt.plot(u[0], u[1], '-o', lw=1, ms=3, label="body 1")
plt.plot(u[4], u[5], '-x', lw=1, ms=3, label="body 2")
plt.plot(u[8], u[9], '-s', lw=1, ms=3, label="body 3")
plt.title('Trajectories of the three bodies')
plt.xlabel('X Position')
plt.ylabel('Y Position')
plt.legend()
plt.grid()
plt.show()
plt.close()
# Plot for d(delta_t) values
plt.plot(t_values, d_values)
plt.title('Delta number for the three bodies')
plt.xlabel('Time (s)')
plt.ylabel('Delta')
plt.grid()
plt.show()
plt.close()
# Plot of distance between P1 and IC
plt.plot(t_values, par_1_pos)
plt.title('Plot of distance between P1 and IC')
plt.xlabel('Time (s)')
plt.ylabel('Distance from origin')
plt.grid()
plt.show()
plt.close()

Make your life less complicated, use the provided time-loop routine solve_ivp.
You want to compute a point of minimal distance to the initial point u0. This is a point where the derivative of the distance goes from negative to positive. The derivative of the square of the distance is the dot product of tangent vector and difference vector. Close to the minimum it makes not much difference if the tangent vector is of the point or of the initial point.
Thus define an event of the "counting" type
Tu0 = odesys(t0,u0)
def dist_plane(u): return Tu0.dot(u-u0)
event0 = lambda t,u: dist_plane(u)
event0.direction = 1
and call the solver with its standard stepper RK45 and all the parameters
solution = solve_ivp(odesys, [t0,tf], u0, events=event0, atol=1e-10, rtol=1e-11)
u_values = solution.y.T
t_values = solution.t
def norm(u): return sum(u**2)**0.5
def dist_1(u): return norm(u[:2]-u0[:2])
def dist_all(u): return sum(norm(uu) for uu in (u-u0).reshape([-1,2]))
par_1_pos = [ dist_1(uu) for uu in u_values]
d_values = [dist_all(uu) for uu in u_values]
d_plane = [dist_plane(uu) for uu in u_values]
The last lines so you can do all the plots as before.
For the minimal however you just have to evaluate the event fields of the returned structure. Printing the distance measures results in a table
for tt,uu in zip(solution.t_events[0], solution.y_events[0]):
print(rf"|{tt:12.8f} | {dist_1(uu):12.8f} | {dist_all(uu):12.8f} | {dist_plane(uu):12.8f} |")
t
dist pos1
dist all
dist deriv
0.00000000
0.00000000
0.00000000
0.00000000
2.81292221
0.58161236
3.54275380
0.00000000
4.17037860
0.35583855
5.77531098
0.00000000
5.71976151
0.63111430
3.98764796
-0.00000000
8.66440460
0.00000019
3.73331800
0.00000000
11.60904921
0.63111445
3.98764485
-0.00000000
13.15843018
0.35583804
5.77530951
0.00000000
14.51588605
0.58161284
3.54275265
-0.00000000
17.32881023
0.00000078
0.00000328
0.00000000
This shorter list should be easier to search for the global minimum.

If the arrays are the same length, you can keep the index of your search.. Otherwise, you may be able to use the percentage through of the array as your starting point for the search of the second array. (position 1000 in an array of 3000 is 1/3rd, so about position 1166 when mapped into an array of 3500)
If you want any value (or there will only be one), you can bisect the data with a binary search.
If you want the first and there may be more than one value, you'll have to do a linear search.

if your x values are time steps
then convert them to time values first by summing them up (but for that you need the array is sorted by time first which would be insane if not already in this case).
if your x values are sorted
then use binary search. if not use linear search.
if your searched value is not exactly present in your data
but you have points below and above you can simply interpolate it from closest neighbors (linear,cubic or higher 2D interpolation) but for that is best if your searched array is sorted otherwise you will have hard time getting valid closest neighbors.

I’m confused with how to convert RGB to YCrCb

Given that:
Y = 0.299R + 0.587G + 0.114B
What values do we put in for R,G, and B? I’m assuming 0-255. For arguments sake, if R, G, B are each 50, then does it mean Y=0.299(50) + 0.587(500) + 0.11(50)?
The next two are also confusing. How can B - Y even be possible if Y contains Blue then isn’t B - Y just taking away itself?
Cb = 0.564( B − Y )
Cr =0.713(R−Y)

It's just simple (confusing) math ...
Remark: There are few standards of YCbCr following formula applies BT.601, with "full range":
Equation (1): Y = 0.299R + 0.587G + 0.114B
The common definition of YCbCr assumes that R, G, and B are 8 bits unsigned integers in range [0, 255].
There are cases where R, G, B are floating point values in range [0, 1] (normalized values).
There are also HDR cases where range is [0, 1023] for example.
In case R=50, G=50, B=50, you just need to assign the values:
Y = 0.299*50 + 0.587*50 + 0.114*50
Result: Y = 50.
Since Y represents the Luma (line luminescence), and RGB=(50,50,50), is a gray pixel, it does make sense that Y = 50.
The following equations Cb = 0.564(B - Y), Cr = 0.713(R - Y) are incorrect.
Instead of Cb, and Cr they should be named Pb and Pr.
Equation (2): Pb = 0.564(B - Y)
Equation (3): Pr = 0.713(R - Y)
The equations mean that you can compute Y first, and then use the result for computing Pb and Pr.
Remark: don't round the value of Y when you are using it for computing Pb and Pr.
You can also assign Equation (1) in (2) and (3):
Pb = 0.564(B - Y) = 0.564(B - (0.299R + 0.587G + 0.114B)) = 0.4997*B - 0.3311*G - 0.1686*R
Pr = 0.713(R - Y) = 0.713(R - (0.299R + 0.587G + 0.114B)) = 0.4998*R - 0.4185*G - 0.0813*B
There are some small inaccuracies.
Wikipedia is more accurate (but still just a result of mathematical assignments):
Y = 0.299*R + 0.587*G + 0.114*B
Pb = -0.168736*R - 0.331264*G + 0.5*B
Pr = 0.5*R - 0.418688*G - 0.081312*B
In the above formulas the range of Pb, Pr is [-127.5, 127.5].
In the "full range" formula of YCbCr (not YPbPr), an offset of 128 is added to Pb and Pr (so result is always positive).
In case of 8 bits, the final result is limited to range [0, 255] and rounded.

What you're referencing is the conversion of RGB to YPbPr.
Conversion to YCbCr is as follows:
Y = 0.299 * R + 0.587 * G + 0.114 * B
Cb = -0.167 * R - 0.3313 * G - 0.5 * B + 128
Cr = 0.5 * R - 0.4187 * G - 0.0813 * B + 128
Yours is YPbPr (which is better for JPEG Compression, see below):
delta = 0.5
Y = 0.299 * R + 0.587 * G + 0.114 * B (same as above)
Pb: (B - Y) * 0.564 + delta
Pr: (B - Y) * 0.713 + delta
The above answer did a better job of explaining this.
I've been looking into JPEG Compression for implementation in Pytorch and found this thread (https://photo.stackexchange.com/a/8357) useful to explain why we use YPbPr for compression over YCbCr.
Pb and Pr versions are better for image compression because the luminance information (which contains the most detail) is retained in only one (Y) channel, while Pb and Pr would contain the chrominance information. Thus when you're doing down-sampling later down the line, there's less loss of valuable info.

Pytorch - Getting gradient for intermediate variables / tensors

As an exercice in pytorch framework (0.4.1) , I am trying to display the gradient of X (gX or dSdX) in a simple Linear layer (Z = X.W + B). To simplify my toy example, I backward() from a sum of Z (not a loss).
To sum up, I want gX(dSdX) of S=sum(XW+B).
The problem is that the gradient of Z (dSdZ) is None. As a result, gX is wrong too of course.
import torch
X = torch.tensor([[0.5, 0.3, 2.1], [0.2, 0.1, 1.1]], requires_grad=True)
W = torch.tensor([[2.1, 1.5], [-1.4, 0.5], [0.2, 1.1]])
B = torch.tensor([1.1, -0.3])
Z = torch.nn.functional.linear(X, weight=W.t(), bias=B)
S = torch.sum(Z)
S.backward()
print("Z:\n", Z)
print("gZ:\n", Z.grad)
print("gX:\n", X.grad)
Result:
Z:
tensor([[2.1500, 2.9100],
[1.6000, 1.2600]], grad_fn=<ThAddmmBackward>)
gZ:
None
gX:
tensor([[ 3.6000, -0.9000, 1.3000],
[ 3.6000, -0.9000, 1.3000]])
I have exactly the same result if I use nn.Module as below:
class Net1Linear(torch.nn.Module):
def __init__(self, wi, wo,W,B):
super(Net1Linear, self).__init__()
self.linear1 = torch.nn.Linear(wi, wo)
self.linear1.weight = torch.nn.Parameter(W.t())
self.linear1.bias = torch.nn.Parameter(B)
def forward(self, x):
return self.linear1(x)
net = Net1Linear(3,2,W,B)
Z = net(X)
S = torch.sum(Z)
S.backward()
print("Z:\n", Z)
print("gZ:\n", Z.grad)
print("gX:\n", X.grad)

First of all you only calculate gradients for tensors where you enable the gradient by setting the requires_grad to True.
So your output is just as one would expect. You get the gradient for X.
PyTorch does not save gradients of intermediate results for performance reasons. So you will just get the gradient for those tensors you set requires_grad to True.
However you can use register_hook to extract the intermediate grad during calculation or to save it manually. Here I just save it to the grad variable of tensor Z:
import torch
# function to extract grad
def set_grad(var):
def hook(grad):
var.grad = grad
return hook
X = torch.tensor([[0.5, 0.3, 2.1], [0.2, 0.1, 1.1]], requires_grad=True)
W = torch.tensor([[2.1, 1.5], [-1.4, 0.5], [0.2, 1.1]])
B = torch.tensor([1.1, -0.3])
Z = torch.nn.functional.linear(X, weight=W.t(), bias=B)
# register_hook for Z
Z.register_hook(set_grad(Z))
S = torch.sum(Z)
S.backward()
print("Z:\n", Z)
print("gZ:\n", Z.grad)
print("gX:\n", X.grad)
This will output:
Z:
tensor([[2.1500, 2.9100],
[1.6000, 1.2600]], grad_fn=<ThAddmmBackward>)
gZ:
tensor([[1., 1.],
[1., 1.]])
gX:
tensor([[ 3.6000, -0.9000, 1.3000],
[ 3.6000, -0.9000, 1.3000]])
Hope this helps!
Btw.: Normally you would want the gradient to be activated for your parameters - so your weights and biases. Because what you would do right now when using the optimizer, is altering your inputs X and not your weights W and bias B. So usually gradient is activated for W and B in such a case.

There's a much simpler way. Simply use retain_grad():
https://pytorch.org/docs/stable/autograd.html#torch.Tensor.retain_grad
Z.retain_grad()
before calling backward()

blue-phoenox, thanks for your answer. I am pretty happy to have heard about register_hook().
What led me to think that I had a wrong gX is that it was independant of the values of X. I will have to do the math to understand it. But using CCE Loss instead of SUM makes things much more clean. So I updated the example for those who might be interested. Using SUM was a bad idea in this case.
T_dec = torch.tensor([0, 1])
X = torch.tensor([[0.5, 0.8, 2.1], [0.7, 0.1, 1.1]], requires_grad=True)
W = torch.tensor([[2.7, 0.5], [-1.4, 0.5], [0.2, 1.1]])
B = torch.tensor([1.1, -0.3])
Z = torch.nn.functional.linear(X, weight=W.t(), bias=B)
print("Z:\n", Z)
L = torch.nn.CrossEntropyLoss()(Z,T_dec)
Z.register_hook(lambda gZ: print("gZ:\n",gZ))
L.backward()
print("gX:\n", X.grad)
Result:
Z:
tensor([[1.7500, 2.6600],
[3.0700, 1.3100]], grad_fn=<ThAddmmBackward>)
gZ:
tensor([[-0.3565, 0.3565],
[ 0.4266, -0.4266]])
gX:
tensor([[-0.7843, 0.6774, 0.3209],
[ 0.9385, -0.8105, -0.3839]])

Backspin effect in pool game with SceneKit

I would like to create a realistic pool game and to implement at least some basic ball effects. I started from scratch with SceneKit and at this point I'm just studying the proper technology to go with it.SceneKit would be the ideal.
I managed to achieve an acceptable ball effect for sidespin and some sort of forward spin. The one I'm struggle with is backspin. I'm playing around with the position parameter of applyForce method, but it seems that alone will not give me the result I'm looking for. Either I'm missing something (I've got limited knowledge of physics) or SceneKit's physics simulation is just not enough for what I want. Basically I have a sphere of 1.5 radius and I went from -1.5 to 1.5 on Y component for the position vector and the result is either the white ball or the ball I'm hitting jumps when collision occurs.
The first screenshot shows the moment of impact whilst the latter shows after the collision and how it jumps.
The two spheres are configured like this
let sphereGeometry = SCNSphere(radius: 1.5)
sphere1 = SCNNode(geometry: sphereGeometry)
sphere1.position = SCNVector3(x: -15, y: 0, z: 0)
sphere2 = SCNNode(geometry: sphereGeometry)
sphere2.position = SCNVector3(x: 15, y: 0, z: 0)
And the code that gives me that effect is the following:
sphere1.physicsBody?.applyForce(SCNVector3Make(350, 0, 0), atPosition:SCNVector3Make(1.5, -0.25, 0), impulse: true)
What I'm trying to do in that code is to hit the ball roughly a bit below the center. How I got to -0.25 was to get an angle of 10 degrees and calculate its sin function. Then I multiplied it by sphere radius so I can get a point that lies right on the sphere's surface.

So I've been reading several papers/chapters about pool physics and I think I found something that at least proves me I can do it with SceneKit. So what I was missing was i. right formulae ii. angular velocity. The physics still need a lot of polish but at least it seems to get roughly the trajectory one would expect when applying these effects. Here's the code in case anyone's interested in:
//Cue strength
let strength : Float = 1000
//Cue mass expressed in terms of ball's mass
let cueMass : Float = self.balls[0].mass * 1.25
//White ball
let whiteBall = self.balls[0]
//The ball we are trying to hit
let targetBall = self.balls[1]
//White ball radius
let ballRadius = whiteBall.radius
//This should be in the range of {-R, R} where R is the ball radius. It determines how much off the center we would like to hit the ball along the z-axis. Produces left/right spin
let a : Float = 0
//This should be in the range of {-R, R} where R is the ball radius. It determines how much off the center we would like to hit the ball along the y-axis. Produces top/back spin
let b : Float = -ballRadius * 0.7
//This is calculated based off a and b and it is the position that we will be hitting the ball along the x-axis.
let c : Float = sqrt(ballRadius * ballRadius - a * a - b * b)
//This is the angle of the cue expressed in degrees. Values greater than zero will produce jump shots
let cueAngle : Float = 0
//Cue angle in radians for math functions
let cueAngleInRadians : Float = (cueAngle * 3.14) / 180
let cosAngle = cos(cueAngleInRadians)
let sinAngle = sin(cueAngleInRadians)
//Values to calculate the magnitude to be applied given the above variables
let m0 = a * a
let m1 = b * b * cosAngle * cosAngle
let m2 = c * c * sinAngle * sinAngle
let m3 = 2 * b * c * cosAngle * sinAngle
let w = (5 / (2 * ballRadius * ballRadius)) * (m0 + m1 + m2 + m3)
let n = 2 * whiteBall.mass * strength
let magnitude = n / (1 + whiteBall.mass / cueMass + w)
//We would like to point to the target ball
let targetVector = targetBall.position
//Get the unit vector of our target
var target = (targetVector - whiteBall.position).normal
//Multiply our direction by the force's magnitude. Y-axis component reflects the angle of the cue
target.x *= magnitude
target.y = (magnitude / whiteBall.mass) * sinAngle
target.z *= magnitude
//Apply the impulse at the given position by c, b, a
whiteBall.physicsBody?.applyForce(target, atPosition: SCNVector3Make(c, b, a), impulse: true)
//Values to calculate angular force
let i = ((2 / 5) * whiteBall.mass * ballRadius * ballRadius)
let wx = a * magnitude * sinAngle
let wy = -a * magnitude * cosAngle
let wz = -c * magnitude * sinAngle + b * magnitude * cosAngle
let wv = SCNVector3Make(wx, wy, wz) * (1 / i)
//Apply a torque
whiteBall.physicsBody?.applyTorque(SCNVector4Make(wv.x, wv.y, wv.z, 0.4), impulse: true)
Note that values of a, b, c should take into account the target vector's direction.

Triangle texture mapping with barycentric coordinates

I want to map textures triangles for 3D rendering in my ray tracer.
I am using barycentric coordinates to locate points on the triangles.
But the result isn't correct.
This is what I did :
3 triangle vertices : a b c
1 intersection point (Raytracing) : p
vertex coordinates are a.x a.y a.z / b.x b.y b.z / c.x c.y c.z
texture coordinates are a.tx a.ty / b.tx b.ty / c.tx c.ty
Firstly , I get the distance between the intersection point and one of the triangle vertices:
d1 = get_distance(p, a);
d2 = get_distance(p, b);
d3 = get_distance(p, c);
Secondly , I calculate the barycentric ratio for each vertices :
r1 = d1 / (d1 + d2 + d3);
r2 = d2 / (d1 + d2 + d3);
r3 = d3 / (d1 + d2 + d3);
And finally , I determine the coordinates of the point on the image :
x_img = (r1 * a.tx) + (r2 * b.tx) + (r3 * c.tx);
y_img = (r1 * a.ty) + (r2 * b.ty) + (r3 * c.ty);
But it doesn't work.
Can you explain me why please ?
The render of Kik-Ass Head :
![kickass]: http://imgur.com/3LAjrWm

Your barycentric calculation seems wrong - from how I ready your code, a component will influence more the longer away it is from the vertex.
I guess it should be something on the form
r1 = (1/d1) / ( (1/d1) + (1/d2) + (1/d3) )
although that would have some problems with the divisor being zero or close to zero. Have a look at http://en.wikipedia.org/wiki/Barycentric_coordinate_system#Converting_to_barycentric_coordinates for a better solution to how to calculate this.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight