UPDATE:
I went around the problem with a DataFrame:
import pandas as pd
import numpy as np
dict = {'x0':[1,1,1,1,1],'x1':[2,3,5,7,8],'x2':[1,5,3,6,7], 'y':[3,2,4,5,8]}
df = pd.DataFrame(dict)
# y = β(0) + β1x1 + β2x2
X = df[['x0','x1','x2']].to_numpy()
Y = df[['y']].to_numpy()
X_transpose = (X.transpose())
beta_hats = np.linalg.inv(X_transpose.dot(X)).dot(X_transpose.dot(Y))
print(beta_hats)
df = pd.DataFrame(beta_hats)
df.rename(columns = {0:'Beta_Hats'}, inplace = True)
print(df)
I wrote the following program to find the beta coefficients from a set of matrices via NumPy. When I converted the array to a list, I ran into problems: some of the decimal points were off :
Array output:
[[ 0.5 ]
[ 1. ]
[-0.25]]
list output: [[0.49999999999999784], [1.0000000000000022], [-0.2500000000000009]]
I am aware Python has some limitations with calculations, but I was wondering if anyone has figured a way around this. Any help would be much appreciated! I haven't been coding for too long (since May) so sorry if this may seem a bit simple to some of you:
import pandas as pd
import numpy as np
dict = {'x0':[1,1,1,1,1],'x1':[2,3,5,7,8],'x2':[1,5,3,6,7], 'y':[3,2,4,5,8]}
df = pd.DataFrame(dict)
X = df[['x0','x1','x2']].to_numpy()
Y = df[['y']].to_numpy()
X_transpose = (X.transpose())
beta_hats = np.linalg.inv(X_transpose.dot(X)).dot(X_transpose.dot(Y))
print(beta_hats)
list = beta_hats.tolist()
print(list)
Related
how can I write a code that shows me the index of where the Newdate1 and Newdate2 is located within Setups. The value for Newdate1 within Setups is the second index which outputs 1 for result. The np.where function does not work however. How could I do this without a for loop?
import numpy as np
Setups = np.array(['2017-09-15T07:11:00.000000000', '2017-09-15T11:25:00.000000000',
'2017-09-15T12:11:00.000000000', '2017-12-22T03:14:00.000000000',
'2017-12-22T03:26:00.000000000', '2017-12-22T03:31:00.000000000',
'2017-12-22T03:56:00.000000000'],dtype="datetime64[ns]")
Newdate1 = np.array(['2017-09-15T07:11:00.000000000'], dtype="datetime64[ns]")
Newdate2 = np.array(['2017-12-22T03:26:00.000000000'], dtype="datetime64[ns]")
result = np.where(Setups == Newdate1)
result2 = np.where(Setups == Newdate2)
Expected Output:
result: 1
result2: 4
use np.in1d to pass the array to be searched within another array and get the indices using np.where.
import numpy as np
Setups = np.array(['2017-09-15T07:11:00.000000000', '2017-09-15T11:25:00.000000000',
'2017-09-15T12:11:00.000000000', '2017-12-22T03:14:00.000000000',
'2017-12-22T03:26:00.000000000', '2017-12-22T03:31:00.000000000',
'2017-12-22T03:56:00.000000000'],dtype="datetime64[ns]")
newdates = np.array(['2017-09-15T07:11:00.000000000','2017-12-22T03:26:00.000000000'],dtype="datetime64[ns]")
print(np.where(np.in1d(Setups,newdates)))
output:
(array([0, 4]),)
I have this code and my aim to calculate the sin of my raster in the power of 0.8.
import os
os.chdir('D:/NOA/Soil_Erosion/test_Project/Workspace/Input_Data_LS_Factor')
import rasterio
import math
data = rasterio.open('Slope_degrees_clipped.tif')
band = data.read(1) # array of float32 with size (3297,2537)
w = band.shape[0]
print(w)
h = band.shape[1]
print(h)
dtypes =data.dtypes[0]
band_calc = math.sin(band)**0.8 # the formula I would like to calculate
However, the following error pops up:
only size-1 arrays can be converted to Python scalars / Rasterio
May you know how I should fix this?
P.S. I tried to vectorize it (np.vectorize()) but it does not work as it needs a real number.
When I use the np.ndarray.flatten(band) the same error occurs.
I found the solution on Geographic Information Systems:
import os
os.chdir('D:/NOA/Soil_Erosion/test_Project/Workspace/Input_Data_LS_Factor')
import rasterio
import math
data = rasterio.open('Slope_degrees_clipped.tif')
from rasterio.plot import show
show(data)
band = data.read(1) # array of float32 with size (3297,2537)
w = band.shape[0]
print(w)
h = band.shape[1]
print(h)
dtypes =data.dtypes[0]
Calculate the sine of the raster in the power of 0.8
import numpy as np
band_calc2 = np.sin(band)**0.8 # the formula I would like to calculate
"""
another way to do it
band_calc = [ [] for i in range(len(band)) ]
for i,row in enumerate(band):
for element in row:
band_calc[i].append(math.sin(element*math.pi/180)**0.8)
"""
I have four "principal" spectra that I want to find coefficients/scalars for to best fit my data. The goal is to know how much of principal x is in the data. I am trying to get the "percent composition" of each principal spectrum to the overall spectrum (I.e. 50% a1, 25% a2, 20% a3, 5% a4.)
#spec = spectrum, a1,a2,a3,a4 = principal components these are all nx1 dimensional arrays
c = 0 #some scalar
d = 0 #some scalar
e = 0 #some scalar
g = 0 #some scalar
def f(spec, c, d, e, g):
y = spec - (a1.multiply(c) - a2.multiply(d) - a3.multiply(e)- a4.multiply(g))
return np.dot(y, y)
res = optimize.minimize(f, spec, args=(c,d,e,g), method='COBYLA', options={'rhobeg': 1.0, 'maxiter': 1000, 'disp': False, 'catol': 0.0002}) #z[0], z[1], z[2], z[3]
best = res['x']
The issue I'm having is that it doesn't seem to give me the scalar values (c,d,e,g) but instead another nx1 dimensional array. Any help greatly appreciated. Also open to other minimize/fit techniques.
After some work, I found two methods that give similar results for this problem.
mport numpy as np
import pandas as pd
import csv
import os
from scipy import optimize
path = '[insert path]'
os.chdir(path)
data = 'data.csv' #original spectra
factors = 'factors.csv' #factor spectra
nfn = 'weights.csv' #new filename
df_data = pd.read_csv(data, header = 0) # read in the spectrum file
df_factors = pd.read_csv(factors, header = 0)
# this array a is going to be our factors
a = df_factors[['0','1','2','3']
Need to seperate the factor spectra from the original data frame.
a1 = pd.Series(a['0'])
a2 = pd.Series(a['1'])
a3 = pd.Series(a['2'])
a4 = pd.Series(a['3'])
b = df_data[['0.75M']] # original spectrum!
b = pd.Series(b['0.75M']) # needs to be in a series
x0 is my initial guess for my coefficient
x0 = np.array([0., 0., 0.,0.])
def f(c):
return b -((c[0]*a1)+(c[1]*a2)+(c[2]*a3)+(c[3]*a4))
using least squares from Scipy optimize least squares
and then later with minimize from the same package, both work, minimize is slightly better IMO.
res = optimize.least_squares(f, x0, bounds = (0, np.inf))
xbest = res.x
x0 = np.array([0., 0., 0., 0.])
def f(c):
y = b -((c[0]*a1)+(c[1]*a2)+(c[2]*a3)+(c[3]*a4))
return np.dot(y,y)
res = optimize.minimize(f, x0, bounds = ((0,np.inf),(0,np.inf),(0,np.inf),(0,np.inf)))
I am trying to print two different lists with numpy and pandas respectively.
The strange thing is that I can only print one list at a time by commenting the other one with all its accosiated code. Do mumpy and pandas have any dependcies?
import numpy as np
import pandas as pd
np.array = []
for i in range(7):
np.array.append([])
np.array[i] = i
values = np.array
print(np.power(np.array,3))
df = pd.DataFrame({'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]})
print(df)
I'm not sure what you mean by "I can only print one list at a time by commenting the other one with all its accosiated code", but any strange behavior you're seeing probably comes from you assigning to np.array. You should name your variable something different, e. g. array. Perhaps you were trying to do this:
arr = []
for i in range(7):
arr.append([])
arr[i] = i
values = np.array(arr)
I was just solving a problem using python, and my codes are:
from math import sin,pi
import numpy
import numpy as np
import pylab
N=20
x = np.linspace(0,1, N)
def v(x):
return 100*sin(pi*x)
#set up initial condition
u0 = [0.0] # Boundary conditions at t= 0
for i in range(1,N):
u0[i] = v(x[i])
And I would want to plot the results by updating v(x) in range(0, N) after. it looks simple but perhaps you guys could help since it gives me an error, like
Traceback (most recent call last):
File "/home/universe/Desktop/Python/sample.py", line 13, in <module>
u0[i] = v(x[i])
IndexError: list assignment index out of range
You could change u0[i] = v(x[i]) to u0.append(v(x[i])). But you should write more elegantly as
u0 = [v(xi) for xi in x]
Indices i are bug magnets.
Since you are using numpy, I'd suggest using np.vectorize. That way you can pass the array x directly to the function and the function will return an array of the same size with the function applied on each element of the input array.
from math import sin,pi
import numpy
import numpy as np
import pylab
N=20
x = np.linspace(0,1, N)
def v(x):
return 100*sin(pi*x)
vectorized_v = np.vectorize(v) #so that the function takes an array of x's and returns an array again
u0 = vectorized_v(x)
Out:
array([ 0.00000000e+00, 1.64594590e+01, 3.24699469e+01,
4.75947393e+01, 6.14212713e+01, 7.35723911e+01,
8.37166478e+01, 9.15773327e+01, 9.69400266e+01,
9.96584493e+01, 9.96584493e+01, 9.69400266e+01,
9.15773327e+01, 8.37166478e+01, 7.35723911e+01,
6.14212713e+01, 4.75947393e+01, 3.24699469e+01,
1.64594590e+01, 1.22464680e-14])
u is a list with one element, so you can't assign values to indices that don't exist. Instead make u a dictionary
u = {}
u[0] = 0.0
for i in range(1,N):
u[i] = v(x[i])