what is number after the datatype of pandas series ¿ - python-3.10

If I run
a = [1,7,2]
myvar=pd.Series(a)
print(myvar)
this one, I got a matrix and "dtype:int64"
what 64 means ?

Related

How to count for 2 different arrays how many times the elements are repeated, in MATLAB?

I have array A (44x1) and B (41x1), and I want to count for both arrays how many times the elements are repeated. And if the repeated values are present in both arrays, I want their counting to be divided (for instance: value 0.5 appears 500 times in A and 350 times in B, so now divide 500 by 350).
I have to do this for bigger arrays as well, so I was thinking about using a looping (but no idea how to do it on MATLAB).
I got what I want on python:
import pandas as pd
data1 = pd.read_excel('C:/Users/Desktop/Python/data1.xlsx')
data2 = pd.read_excel('C:/Users/Desktop/Python/data2.xlsx')
for i in data1['Mag'].value_counts() & data2['Mag'].value_counts():
a = data1['Mag'].value_counts()/data2['Mag'].value_counts()
print(a)
break
Any idea of how to do the same on MATLAB? Thanks!
Since you can enumerate all valid earthquake magnitude values, you could use:
% Make up some data
A=randi([2 58],[100 1])/10;
B=randi([2 58],[20 1])/10;
% Round data to nearest tenth
%A=round(A,1); %uncomment if necessary
%B=round(B,1); %same
% Divide frequencies
validmags=0.2:0.1:5.8;
Afreqs=sum(double( abs(A-validmags)<1e-6 ),1); %relies on implicit expansion; A must be a column vector and validmags must be a row vector; dimension argument to sum() only to remind user; double() not really needed
Bfreqs=sum(double( abs(B-validmags)<1e-6 ),1); %same
Bfreqs./Afreqs, %for a fancier version: [{'Magnitude'} num2cell(validmags) ; {'Freq(B)/Freq(A)'} num2cell(Bfreqs./Afreqs)].'
The last line will produce NaN for 0/0, +Inf for nn/0, and 0 for 0/nn.
You could also use uniquetol, align the unique values of each vector, and divide the respective absolute frequencies. But I think the above approach is cleaner and easier to understand.

Pandas DataFrame from Numpy Array - column order

I'm trying to read data from a .csv file using Pandas, smoothing it with Savitsky-Golay filter, filtering it and then using Pandas again to write an output csv file. Data must be converted from DataFrame to an array to perform smoothing and then again to DataFrame to create the output file.
I found a topic on creation of dataframe from numpy arrays (Creating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?) and i used the dataset = pd.DataFrame({'Column1': data[:, 0], 'Column2': data[:, 1]}) line to create mine.
The problem is that when I rename the column names to 'time' for first column and 'angle' for the second one, the order in the final dataframe changes. It seems as if the alphabetical order is important, which seems weird.
Can someone help me with an explanation?
My complete code:
import scipy as sp
from scipy import signal
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Specify the input file
in_file = '0_chunk0_test.csv'
# Define min and max angle values
alpha_min = 35
alpha_max = 45
# Define Savitsky-Golay filter parameters
window_length = 15
polyorder = 1
# Read input .csv file, but only time and pitch values using usecols argument
data = pd.read_csv(in_file,usecols=[0,2])
# Replace ":" with "" in time values
data['time'] = data['time'].str.replace(':','')
# Convert pandas dataframe to a numpy array, use .astype to convert
# string to float
data_arr = data.to_numpy(dtype=np.dtype,copy=True)
data_arr = data_arr.astype(np.float)
# Perform a Savitsky-Golay filtering with signal.savgol_filter
data_arr_smooth = signal.savgol_filter(data_arr[:,1],window_length,polyorder)
# Convert smoothed data array to dataframe and rename Pitch: to angle
data_fr = pd.DataFrame({'time': data_arr[:,0],'angle': data_arr_smooth})
print data_fr
Your question is essentially: why does this code result in a column order that is alphabetical, rather than the order that I provided?
data_fr = pd.DataFrame({'time': data_arr[:,0],'angle': data_arr_smooth})
Recent versions of pandas (0.23+ or 1.0+) actually do what you want, with columns ['time', 'angle'] rather than ['angle', 'time'].
Up to Python 3.5, dictionaries did not preserve the order of keys; by sorting alphabetically, pandas would at least give a reproducible column order. This was changed in Pandas 0.23 (released May 2018).
If your data is already in a dataframe, it's much easier to just pass the values of the Pitch column to savgol_filter:
data_arr_smooth = signal.savgol_filter(data.Pitch.values, window_length, polyorder)
data_fr = pd.DataFrame({'time': data.time.values,'angle': data_arr_smooth})
There's no need to explicitly convert your data to float as long as they are numeric, savgol_filter will do this for you:
If x is not a single or double precision floating point array, it
will be converted to type numpy.float64 before filtering.
If you want both original and smoothed data in you original dataframe then just assign a new column to it:
data['angle'] = signal.savgol_filter(data.Pitch.values, window_length, polyorder)

Matlab changing array value in error when appending matrices

I have a very strange bug in MATLAB (R2016a) where appending a ones array using vertcat (or using regular appending with [A; B]) results in a matrix where the ones have been scaled down to 0.0001 instead of 1. Multiplying the ones matrix by 10000 fixes the issue but I would like to know why 0.0001 is being appended instead of 1. Here is the code:
temp = ones([1,307200]);
new_coords = vertcat(world_coords, temp);
new_coords
which results in columns like the following being outputted:
0.4449
0.3673
1.8984
0.0001
The type for world_coords is double, so I don't think typecasting is the issue.
As mentioned in my comment, the output is scaled due to the range of the the values in world_coords. You should see in the first line of the output a scaling factor of 1.0e+4.
You can change the output format for example with:
format long
For more details see: format

R: Defining large arrays

I need a large arrays with size greater than int numbers. This is my code:
input <- array(0,c(n,i,j))
n is '33', i is '134395553' and j is '671'. The value of i is greater than the maximum value of integer, so I get this error:
Error in array(0,c(n,i,j)) :
negative length vectors are not allowed
In addition: Warning message:
In array(0,c(n,i,j)) :
NAs introduced by coercion to integer range
So, What can I do for such large array?
Unfortunately I need such large array. I have a rating matrix of 163949 items and 671 user. I want to build a priority matrix so I will have an array of 671 users and 134395553 items. Also I am extracting 33 feature for each (user, priority) pair, which means I need an array of 33 by 671 by 134395553.
as.bigz in the library gmp lets you store large sized integer values.
i <- as.bigz(134395553)
i
Big Integer ('bigz') :
[1] 134395553

How to convert a cell array of 2D matrices into a multidimensional array in MATLAB

In MATLAB, I have a defined cell array C of
size(C) = 1 by 150
Each matrix T of this cell C is of size
size(C{i}) = 8 by 16
I am wondering if there is a way to define a new multidimension (3D) matrix M that is of size 8 by 16 by 150
That is when I write the command size(M) I get 8 by 16 by 150
Thank you! Looking forward for your answers
If I'm understanding your problem correctly, you have a cell array of 150 cells, and each cell element is 8 x 16, and you wish to stack all of these matrices together in the third dimension so you have a 3D matrix of size 8 x 16 x 150.
It's a simple as:
M = cat(3, C{:});
This syntax may look strange, but it's very valid. The command cat performs concatenation of matrices where the first parameter is the dimension you want to concatenate to... so in your case, that's the third dimension, and the parameters after are the matrices you want to concatenate to make the final matrix.
Doing C{:} creates what is known as a comma-separated list. This is equivalent to typing out the following syntax in MATLAB:
C{1}, C{2}, C{3}, ..., C{150}
Therefore, by doing cat(3, C{:});, what you're really doing is:
cat(3, C{1}, C{2}, C{3}, ..., C{150});
As such, you're taking all of the 150 cells and concatenating them all together in the third dimension. However, instead of having to type out 150 individual cell entries, that is encapsulated by creating a comma-separated list via C{:}.

Resources