I have a empty pandas DataFrame:
aqi_df = pd.DataFrame(columns = ["IMEI","Date","pm10conc_24hrs","pm25conc_24hrs","sdPm10","sdPm25","aqi","windspeed","winddirection","severity","health_impact"] )
I want to add elements one by one to each column -
for i in range(1,10):
aqi_df.IMEI.append("a")
aqi_df.Date.append("b")
aqi_df.pm10conc_24hrs.append("c")
.
.
.
But append throws an error
TypeError: cannot concatenate a non-NDFrame object
How can I append elements to pandas dataframe one by one?
IIUC you can use:
aqi_df = pd.DataFrame(columns = ["IMEI","Date","pm10conc_24hrs"] )
print (aqi_df)
for i in range(1,10):
aqi_df.loc[i] = ['a','b','c']
print (aqi_df)
IMEI Date pm10conc_24hrs
1 a b c
2 a b c
3 a b c
4 a b c
5 a b c
6 a b c
7 a b c
8 a b c
9 a b c
But better is creating DataFrame from Series or dict:
IMEI = pd.Series(['aa','bb','cc'])
Date = pd.Series(['2016-01-03','2016-01-06','2016-01-08'])
pm10conc_24hrs = pd.Series(['w','e','h'])
aqi_df = pd.DataFrame({'a':IMEI,'Date':Date,'pm10conc_24hrs':pm10conc_24hrs})
print (aqi_df)
Date a pm10conc_24hrs
0 2016-01-03 aa w
1 2016-01-06 bb e
2 2016-01-08 cc h
aqi_df = pd.DataFrame({'a':['aa','bb','cc'],
'Date':['2016-01-03','2016-01-06','2016-01-08'],
'pm10conc_24hrs':['w','e','h']})
print (aqi_df)
Date a pm10conc_24hrs
0 2016-01-03 aa w
1 2016-01-06 bb e
2 2016-01-08 cc h
Related
I have a data set as below:
data={ 'StoreID':['a','b','c','d'],
'Sales':[1000,200,500,800],
'Profit':[600,100,300,500]
}
data=pd.DataFrame(data)
data.set_index(['StoreID'],inplace=True,drop=True)
X=data.values
from sklearn.metrics.pairwise import euclidean_distances
dist=euclidean_distances(X)
Now I get an array as below:
array([[0. ,943,583,223],
[943, 0.,360,721],
[583,360,0., 360],
[223,721,360, 0.]])
My purpose to get unique combinations of stores and their corresponding distance. I would like the end results as a data frame below:
Store NextStore Dist
a b 943
a c 583
a d 223
b c 360
b d 721
c d 360
Thank you for your help!
You probably want pandas.melt which will "unpivot" the distance matrix into tall-and-skinny format.
m = pd.DataFrame(dist)
m.columns = list('abcd')
m['Store'] = list('abcd')
...which produces:
a b c d Store
0 0.000000 943.398113 583.095189 223.606798 a
1 943.398113 0.000000 360.555128 721.110255 b
2 583.095189 360.555128 0.000000 360.555128 c
3 223.606798 721.110255 360.555128 0.000000 d
Melt data into tall-and-skinny format:
pd.melt(m, id_vars=['Store'], var_name='nextStore')
Store nextStore value
0 a a 0.000000
1 b a 943.398113
2 c a 583.095189
3 d a 223.606798
4 a b 943.398113
5 b b 0.000000
6 c b 360.555128
7 d b 721.110255
8 a c 583.095189
9 b c 360.555128
10 c c 0.000000
11 d c 360.555128
12 a d 223.606798
13 b d 721.110255
14 c d 360.555128
15 d d 0.000000
Remove redundant rows, convert dist to int, and sort:
df2 = pd.melt(m, id_vars=['Store'],
var_name='NextStore',
value_name='Dist')
df3 = df2[df2.Store < df2.NextStore].copy()
df3.Dist = df3.Dist.astype('int')
df3.sort_values(by=['Store', 'NextStore'])
Store NextStore Dist
4 a b 943
8 a c 583
12 a d 223
9 b c 360
13 b d 721
14 c d 360
I have a 2D array with shape (14576, 24) respectively elements and the number of features that are going to constitute my dataframe.
data_from_2Darray = {}
for i in range(name_2Darray.shape[1]):
data_from_2Darray["df_column_name{}".format(i)] = name_2Darray[:,i]
#visualize the result
pd.DataFrame(data=data_from_2Darray).plot()
#the converted array in dataframe
df = pd.DataFrame(data=data_from_2Darray)
I have a set of values in the following pattern.
A B C D
1 5 6 11
2 6 5 21
3 7 3 42
4 3 7 22
1 2 3 54
2 3 2 43
3 4 3 27
4 3 2 14
I exported the every column into MATLAB workspace as follows.
A = xlsread('F:\R.xlsx','Complete Data','A2:A43');
B = xlsread('F:\R.xlsx','Complete Data','B2:B43');
C = xlsread('F:\R.xlsx','Complete Data','C2:C43');
D = xlsread('F:\R.xlsx','Complete Data','D2:D43');
I need help with code where the it has to check the Column A, find the lowest D value and output the corresponding B and C values. I need the output to look like.
1 5 6 11
2 6 5 21
3 4 3 27
4 3 2 14
I read through related questions and understand that I need to make it a matrix and sort it based on the element on the 4th column using
sortrows
and get indices of the sorted elements. But I am stuck here. Please Guide me.
You can export those columns in one go as:
ABCD = xlsread('F:\R.xlsx','Complete Data','A2:D43');
Now use sortrows to sort the rows according to the first and the fourth column.
req = sortrows(ABCD, [1 4]);
☆ If all elements of the first column exist twice then:
req = req(1:2:end,:);
☆ If it is not necessary that all elements of the first column will exist twice then:
[~, ind] = unique(req(:,1));
req = req(ind,:);
I have a sheet the the followin info
"A" "B"
_____
a 1
a 1
a 1
a 1
a 1
a 1
b 1
b 1
b 0
b 1
b 1
b 1
c 1
c 0
c 1
d 1
d 1
I Like to have an Array formula that multiplies al values on column "B" that have the same text on column "A"
I don't whant to use a pivot table, i need to solve this with formulas.
End result should be on column "C"
a = 1
b = 0
c = 0
d = 1
thanks
I can do it in two columns....
Column C formula
=IF(A3=A2,C3*B2,B2)
Column D formula
=IF(A2<>A1,CONCATENATE(A2," = ",C2),"")
gives results like this
Excel image
if the blanks are a problem,
http://www.cpearson.com/excel/NoBlanks.aspx
DPRODUCT() might do it, as well, if you can setup the criteria correctly.
OK i solved using this formula as array
=+SI(SUMA(SI((A:A=A1);B:B;0))=CONTAR.SI(A:A;A1);"OK";"")
I have rank scores of countries for different variables.
I would like to create a column with the maximum rank that occurs per row.
Say the data look something like:
A B C D E F G H I ....
V1 1 4 5 3 12 . 6 9 83
V2 . . 4 6 1 4 7 6 32
So A - X are countries. In rows V1 up you have various variables and in the cells you have the rank score relating to the variable.
Problem is that some countries for whatever reasons don´t score in relation to certain variables, perhaps because V1 is not relevant to country C or whatever.
So in the end I´d like something like
A B C D E F G H I .... newv
V1 1 4 5 3 12 . 6 9 83 83
V2 . . 4 6 1 4 7 6 5 6
I think egen newvar=rowmax(A B C D E F G H I…) does what you need. Have a look at the egen help file for more information. (I presume you need value 7 in the second row, not 6?)
I have a array of names and a function that returns a data frame. I want to combine this array and data frame. For e.g.:
>mynames<-c("a", "b", "c")
>df1 <- data.frame(val0=c("d", "e"),val1=4:5)
>df2 <- data.frame(val1=c("e", "f"),val2=5:6)
>df3 <- data.frame(val2=c("f", "g"),val3=6:7)
What I want is a data frame that joins this array with data frame. df1 corresponds to "a", df2 corresponds to "b" and so on. So, the final data frame looks like this:
Names Var Val
a d 4
a e 5
b e 5
b f 6
c f 6
c g 7
Can someone help me on this?
Thanks.
This answers this particular question, but I'm not sure how much help it will be for your actual problem:
myList <- list(df1, df2, df3)
do.call(rbind,
lapply(seq_along(mynames), function(x)
cbind(Names = mynames[x], setNames(myList[[x]],
c("Var", "Val")))))
# Names Var Val
# 1 a d 4
# 2 a e 5
# 3 b e 5
# 4 b f 6
# 5 c f 6
# 6 c g 7
Here, we create a list of your data.frames, and in our lapply call, we add in the new "Names" column and rename the existing columns so that we can use rbind to put them all together.