I have a matrix with dimensions 20009x28011 and I want to calculate the textural features. The problem is that I need positive integer values and matrix values are float and some negative.
I tried to do some normalization and converting the values to integer, but it does not work.
Related
I have a oceanic weather dataset over three dimension (time, x, y) which includes two data arrays with two different variables (Hs, Te)
I want to compute a third data array (power generated) based on the values of the other two data arrays, but I cannot do it arithmetically. I have a 2D pandas dataframe (power matrix) which gives the power output depending on the combination of Hs and Te.
[enter image description here][1]
I have tried to input the Hs and Te data arrays using loc and iloc in but it does not take multidimensional indexes.
Is there a way I can use Hs, and Te, for every given timestep and coordinates, to calculate the power output from the matrix and assign it to the third data array?
I will really appreciate it if someone can help me!
Here is a bit of what i have tried.
def get_gen(ds, power_matrix):
Hs= ds['wave_height']
Te= ds['wave_period']
power_mat =power_matrix
power = power_mat.loc[[Hs], [Te]]
return power
test = get_gen(ds, power_matrix)```
[1]: https://i.stack.imgur.com/sp5FU.png
I have a 2D array of size 30*70.
I have 70 columns. My values are very large ranging from 8066220960081 to (some number with same power of 10 as lowerlimit) and I need to plot a scatter plot in an array. How do I index into the array given very large values?
Also, I need to do this in kernel space
Let's take an array long long int A with large values.
A[0] = 393782040
A[1] = 2*393782040
... and so on
A[N] = 8066220960081; where N = 30*70 - 1
We can scale A with a factor or we can shift A by a certain number and scale it again. That's where you can deal with numbers ranging between 0 and 1 or -1 and 1 or x and y. You choose as per your need. Theoretically, this should not make a difference to the scatter plot other than the placement of the axis. However, if your scatter plot is also a representative of the underlying values i.e. the dots are proportional to values; then it is a good idea to be nice to your plotting tool and not flood it with terribly large values that might lead to overflow depending on how the code for plotting is written.
PS: I would assume you know how to flatten a 2d array.
I just ended up doing regular interval calculation between max and min
and then start from min + interval*index to get the number.
index would be the index in array.
In short, what I want is:
custom_2d_matrix(n, m) where 1..x are arrays of integers with size n and x+1..m are arrays of real numbers with size m. (1 <= x <= m)
For example, I want to create a custom type, which is a mixed precision matrix, or you can call it a matrix with different data types.
The custom_2d_matrix has n rows and m columns.
Column number from 1 to x are column vectors with integer data type.
Column number from x to m are column vectors with real data type.
Note that 1 <= x <= m
The custom_2d_matrix must be accessible through index only. For example, custom_2d_matrix(i, j) will bring up the i-th array (can be either integer array or real array) and then the j-th element in the said array. So for this reason, derived type is not good enough.
I am having trouble understanding the following question:
Define a two dimensional array data[12][5] of type double. Initialize the elements in the first column with values from 2.0 to 3.0 inclusive in steps of 0.1. If the first element in a row has the value of x, populate the remaining elements in each row with the values 1/x x^2, x^3, x^4. Output the values in the array with each row on a separate line and with a heading for each column.
The part, I don't understand is where the exercise says if the first element has the value of x, then fill each row with...
My question: what is x in this context?
I know that PCA does not tell you which features of a dataset are the most significant, but which combinations of features keep the most variance.
How could you use the fact that PCA rotates the dataset in such a way that it has the most variance along the first dimension, second most along second, and so on to reduce the dimensionality of the dataset?
I mean, more in depth, How are the first N eigenvectors used to transform the feature vectors into a lower-dimensional representation that keeps most of the variance?
Let X be an N x d matrix where each row X_{n,:} is a vector from the dataset.
Then X'X is the covariance matrix and an eigen decomposition gives X'X=UDU' where U is a d x d matrix of eigenvectors with U'U=I and D is a d x d diagonal matrix of eigenvalues.
The form of the eigendecomposition means that U'X'XU=U'UDU'U=D which means that if you transform your dataset by U then the new dataset, XU, will have a diagonal covariance matrix.
If the eigenvalues are ordered from largest to smallest, this also means that the average squared value of the first transformed feature (given by the expression U_1'X'XU_1=\sum_n (\sum_d U_{1,d} X_{n,d})^2) will be larger that the second, the second larger than the third, etc.
If we order the features of a dataset from largest to smallest average value, then if we just get rid of the features with small average values (and the relative sizes of the large average values are much larger than the small ones), then we haven't lost much information. That is the concept.