how can I write a code that shows me the index of where the Newdate1 and Newdate2 is located within Setups. The value for Newdate1 within Setups is the second index which outputs 1 for result. The np.where function does not work however. How could I do this without a for loop?
import numpy as np
Setups = np.array(['2017-09-15T07:11:00.000000000', '2017-09-15T11:25:00.000000000',
'2017-09-15T12:11:00.000000000', '2017-12-22T03:14:00.000000000',
'2017-12-22T03:26:00.000000000', '2017-12-22T03:31:00.000000000',
'2017-12-22T03:56:00.000000000'],dtype="datetime64[ns]")
Newdate1 = np.array(['2017-09-15T07:11:00.000000000'], dtype="datetime64[ns]")
Newdate2 = np.array(['2017-12-22T03:26:00.000000000'], dtype="datetime64[ns]")
result = np.where(Setups == Newdate1)
result2 = np.where(Setups == Newdate2)
Expected Output:
result: 1
result2: 4
use np.in1d to pass the array to be searched within another array and get the indices using np.where.
import numpy as np
Setups = np.array(['2017-09-15T07:11:00.000000000', '2017-09-15T11:25:00.000000000',
'2017-09-15T12:11:00.000000000', '2017-12-22T03:14:00.000000000',
'2017-12-22T03:26:00.000000000', '2017-12-22T03:31:00.000000000',
'2017-12-22T03:56:00.000000000'],dtype="datetime64[ns]")
newdates = np.array(['2017-09-15T07:11:00.000000000','2017-12-22T03:26:00.000000000'],dtype="datetime64[ns]")
print(np.where(np.in1d(Setups,newdates)))
output:
(array([0, 4]),)
Related
UPDATE:
I went around the problem with a DataFrame:
import pandas as pd
import numpy as np
dict = {'x0':[1,1,1,1,1],'x1':[2,3,5,7,8],'x2':[1,5,3,6,7], 'y':[3,2,4,5,8]}
df = pd.DataFrame(dict)
# y = β(0) + β1x1 + β2x2
X = df[['x0','x1','x2']].to_numpy()
Y = df[['y']].to_numpy()
X_transpose = (X.transpose())
beta_hats = np.linalg.inv(X_transpose.dot(X)).dot(X_transpose.dot(Y))
print(beta_hats)
df = pd.DataFrame(beta_hats)
df.rename(columns = {0:'Beta_Hats'}, inplace = True)
print(df)
I wrote the following program to find the beta coefficients from a set of matrices via NumPy. When I converted the array to a list, I ran into problems: some of the decimal points were off :
Array output:
[[ 0.5 ]
[ 1. ]
[-0.25]]
list output: [[0.49999999999999784], [1.0000000000000022], [-0.2500000000000009]]
I am aware Python has some limitations with calculations, but I was wondering if anyone has figured a way around this. Any help would be much appreciated! I haven't been coding for too long (since May) so sorry if this may seem a bit simple to some of you:
import pandas as pd
import numpy as np
dict = {'x0':[1,1,1,1,1],'x1':[2,3,5,7,8],'x2':[1,5,3,6,7], 'y':[3,2,4,5,8]}
df = pd.DataFrame(dict)
X = df[['x0','x1','x2']].to_numpy()
Y = df[['y']].to_numpy()
X_transpose = (X.transpose())
beta_hats = np.linalg.inv(X_transpose.dot(X)).dot(X_transpose.dot(Y))
print(beta_hats)
list = beta_hats.tolist()
print(list)
I have a requirement to query a column in a pyspark.sql.dataframe.DataFrame. I wish to create a string array from that column. I am using numpty arrays to achieve this however the result I get is an array of arrays
import numpy as np
df = spark.read.load(parquetfiles/part-00000-e7dad738-8895-45e8-9926-39c9d677b999-c000.snappy.parquet', format='parquet')
data_array = np.asarray(df.select('name').collect())
print(type(data_array),data_array)
for x in data_array:
str = x[0]
print(type(x))
The output I get from my first print is:
<class 'numpy.ndarray'> [['London']
['New York']
['Paris']
['Rome']
['Berlin']]
And from the second Print I get
<class 'numpy.ndarray'>
So my question: is it possible to get these values as string array or failing that can I create a dynamic which I add the values of str in my for loop to as strings?
Things I've tried.
use asarray instead of array, as you can see I get the same.
data_array = list(data_array), well I get a list but its not usable as it contains all the meta too.
Open to suggestions and additional reading rather than full solutions.
Thanks.
The power of the post.
import numpy as np
df = spark.read.load('parquetfiles/part-00000-e7dad738-8895-45e8-9926-39c9d677b999-c000.snappy.parquet', format='parquet')
data_array = np.asarray(df.select('name').collect())
cases = []
for x in data_array:
str = x[0]
cases.append(str)
I have four arrays, all of which contain zeros and NaNs, and I'm trying to get a total count of the number of elements which are all nonzero, and nonNaN across all arrays. MWE:
import numpy as np
np.random.seed(100)
array = np.random.rand(10,5)
array[0][0] = np.nan
array[1][0] = np.nan
array[0][3] = np.nan
array[5][2] = 0
array[5][4] = np.nan
If I type
np.count_nonzero(np.logical_and(~np.isnan(array[1]), ~np.isnan(array[2]), ~np.isnan(array[3])))
I get an output of 4 as expected. But adding one more condition like
np.count_nonzero(np.logical_and(~np.isnan(array[1]), ~np.isnan(array[2]), ~np.isnan(array[3]), ~np.isnan(array[9])))
gives me
Traceback (most recent call last):
File "<ipython-input-36-02311cb3ca54>", line 1, in <module>
np.count_nonzero(np.logical_and(~np.isnan(array[1]), ~np.isnan(array[2]), ~np.isnan(array[3]), ~np.isnan(array[9])))
ValueError: invalid number of arguments
Why do I get the error by adding one more condition?
np.logical_and only takes two arrays and returns the element wise and of the two arrays. And the maximum number of position arguments it takes is 3, thus why you getting the error and it's not doing what you want it to do; A better option is to rewrite your logic as follows, and you can easily add more rows in the row indices list in this way:
(~np.isnan(array[[1,2,3,9]])).all(axis=0).sum()
# 4
or:
np.count_nonzero((~np.isnan(array[[1,2,3,9]])).all(axis=0))
# 4
I think the answer is that you are using count_nonzero incorrectly.
From the docs it only takes 2 parameters: numpy.count_nonzero(a, axis=None)[source] [numpy count_zero docs][1]
And also you can't call
So why not something like this:
import numpy as np
np.random.seed(100)
array = np.random.rand(10,5)
array[0][0] = np.nan
array[1][0] = np.nan
array[0][3] = np.nan
array[5][2] = 0
array[5][4] = np.nan
print(np.count_nonzero((array)) - np.sum(np.isnan(array)))
So you count all the non_zeros which includes the nan's. So subtract those.
So do that for each array and add them.
e.g.
mySum = np.count_nonzero((array[1])) - np.sum(np.isnan(array[1]))
mySum += np.count_nonzero((array[2])) - np.sum(np.isnan(array[2]))
mySum += np.count_nonzero((array[3])) - np.sum(np.isnan(array[3]))
mySum += np.count_nonzero((array[9])) - np.sum(np.isnan(array[9]))
print(mySum)
I am trying to print two different lists with numpy and pandas respectively.
The strange thing is that I can only print one list at a time by commenting the other one with all its accosiated code. Do mumpy and pandas have any dependcies?
import numpy as np
import pandas as pd
np.array = []
for i in range(7):
np.array.append([])
np.array[i] = i
values = np.array
print(np.power(np.array,3))
df = pd.DataFrame({'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]})
print(df)
I'm not sure what you mean by "I can only print one list at a time by commenting the other one with all its accosiated code", but any strange behavior you're seeing probably comes from you assigning to np.array. You should name your variable something different, e. g. array. Perhaps you were trying to do this:
arr = []
for i in range(7):
arr.append([])
arr[i] = i
values = np.array(arr)
I was just solving a problem using python, and my codes are:
from math import sin,pi
import numpy
import numpy as np
import pylab
N=20
x = np.linspace(0,1, N)
def v(x):
return 100*sin(pi*x)
#set up initial condition
u0 = [0.0] # Boundary conditions at t= 0
for i in range(1,N):
u0[i] = v(x[i])
And I would want to plot the results by updating v(x) in range(0, N) after. it looks simple but perhaps you guys could help since it gives me an error, like
Traceback (most recent call last):
File "/home/universe/Desktop/Python/sample.py", line 13, in <module>
u0[i] = v(x[i])
IndexError: list assignment index out of range
You could change u0[i] = v(x[i]) to u0.append(v(x[i])). But you should write more elegantly as
u0 = [v(xi) for xi in x]
Indices i are bug magnets.
Since you are using numpy, I'd suggest using np.vectorize. That way you can pass the array x directly to the function and the function will return an array of the same size with the function applied on each element of the input array.
from math import sin,pi
import numpy
import numpy as np
import pylab
N=20
x = np.linspace(0,1, N)
def v(x):
return 100*sin(pi*x)
vectorized_v = np.vectorize(v) #so that the function takes an array of x's and returns an array again
u0 = vectorized_v(x)
Out:
array([ 0.00000000e+00, 1.64594590e+01, 3.24699469e+01,
4.75947393e+01, 6.14212713e+01, 7.35723911e+01,
8.37166478e+01, 9.15773327e+01, 9.69400266e+01,
9.96584493e+01, 9.96584493e+01, 9.69400266e+01,
9.15773327e+01, 8.37166478e+01, 7.35723911e+01,
6.14212713e+01, 4.75947393e+01, 3.24699469e+01,
1.64594590e+01, 1.22464680e-14])
u is a list with one element, so you can't assign values to indices that don't exist. Instead make u a dictionary
u = {}
u[0] = 0.0
for i in range(1,N):
u[i] = v(x[i])