Turn a pandas dataframe into a two dimensional array - arrays

I have a dataframe with three columns. X, Y, and counts, where counts is the number of occurences where x and y appear together. My goal is to transform this from a dataframe to an array of two dimensions where X is the name of the rows, Y is the name of the columns and the counts make up the records in the table.
Is this possible? I can elaborate if needed.

To get the same result as a pivot table, you can also perform a groupby operation and then unstack one of the columns:
import numpy as np
import pandas as pd
df = pd.DataFrame({'color': ['red', 'blue', 'black'] * 2,
'vehicle': ['car', 'truck'] * 3,
'value': np.arange(1, 7)})
>>> df
color value vehicle
0 red 1 car
1 blue 2 truck
2 black 3 car
3 red 4 truck
4 blue 5 car
5 black 6 truck
>>> df.groupby(['color', 'vehicle']).sum().unstack('vehicle')
value
vehicle car truck
color
black 3 6
blue 5 2
red 1 4

Here is an IPython session that may be a good simulation of what you are trying to do:
In [17]: import pandas as pd
In [18]: from random import randint
In [19]: x = ['a', 'b', 'c'] * 4
In [20]: y = ['i', 'j', 'k', 'l'] * 3
In [21]: counts = [randint(10, 20) for i in range(12)]
In [22]: df = pd.DataFrame(dict(x=x, y=y, counts=counts))
In [23]: df.head()
Out[23]:
counts x y
0 16 a i
1 10 b j
2 16 c k
3 15 a l
4 19 b i
In [24]: df.pivot(index='x', columns='y', values='counts')
Out[24]:
y i j k l
x
a 16 14 18 15
b 19 10 15 20
c 10 18 16 16
In [25]: df.pivot(index='x', columns='y', values='counts').values
Out[25]:
array([[16, 14, 18, 15],
[19, 10, 15, 20],
[10, 18, 16, 16]], dtype=int64)

Related

Find unique tuples inside a numpy array with np.where

I want to find unique color tuples inside a numpy array with np.where. My code so far is:
from __future__ import print_function
import numpy as np
a = range(10)
b = range(10,20)
c = range(20,30)
d = np.array(zip(a, b, c))
print(d)
e = np.array(zip(c, b, a))
print(e)
search = np.array((1,11,21))
search2 = np.array((0,11,21))
print(search, search2)
f = np.where(d == search, d, e)
g = np.where(d == search2, d, e)
print(f)
print(g)
When I run the code it finds correctly the tuple search on the second position. But the tuple search2 is also found on the first position although it is not contained as unique tuple inside the array. How can I define in numpy that only unique tuples are to be found inside the array so it gives some for g the value
[[20 10 0] [21 11 1] [22 12 2]
[23 13 3] [24 14 4] [25 15 5]
[26 16 6] [27 17 7] [28 18 8] [29 19 9]]
but still finds the unique tuple search and gives for f
[[20 10 0] [ 1 11 21] [22 12 2]
[23 13 3] [24 14 4] [25 15 5]
[26 16 6] [27 17 7] [28 18 8] [29 19 9]]
?
EDIT:
OK, so the current problem is to write a GIF decoder in python. I have a previous frame called image_a and a following frame called image_b. image_b contains pixels with a certain transparency color tuple, called transp_color in this specific case for the uploaded images it is (0, 16, 8). The routine is supposed to replace all these entries with that color tuple with pixel values from image_a but leaving other pixels unchanged. My code is:
from __future__ import print_function
import numpy as np
import cv2
image_a = cv2.imread("old_frame.png")
image_b = cv2.imread("new_frame.png")
cv2.imshow("image_a", image_a)
cv2.imshow("image_b", image_b)
transp_color = (0, 16, 8)
new_image = np.where(image_b == transp_color, image_a, image_b)
cv2.imshow("new_image", new_image)
cv2.waitKey()
Trying to solve this with np.where leads to wrong colors in the resulting image as seen above in the 3rd picture. So any idea how to solve this?
The expression image_b == transp_color will result in an NxMx3 array of booleans. That is what np.where will act upon as well. If I understand your question correctly, the simplest solution here is to turn that expression into np.all(image_b == transp_color, axis=-1, keepdims=True). This will return an NxMx1 array of booleans; which np.where will broadcast over the color channel, so that you pick a pixel from either image A or B.
OK, finally found the solution myself, here is the code for doing it:
import numpy as np
import cv2
image_a = cv2.imread("old_frame.png")
image_b = cv2.imread("new_frame.png")
cv2.imshow("image_a", image_a)
cv2.imshow("image_b", image_b)
transp_color = (0, 16, 8)[::-1]
channels = 3
f = np.all((image_b==transp_color), axis=-1)
flattened_image = np.reshape(image_b, (image_b.shape[0]*image_b.shape[1], channels))
old_flattened_image = np.reshape(image_a, (image_a.shape[0]*image_a.shape[1], channels))
f = np.reshape(f, (image_a.shape[0]*image_a.shape[1], 1))
np_image = np.array([old_flattened_image[i] if j else flattened_image[i] for i, j in enumerate(f)])
new_image = np.reshape(np_image, (image_a.shape[0], image_a.shape[1], channels))
# new_image = np.where(image_b == transp_color, image_a, image_b)
cv2.imshow("new_image", new_image)
cv2.waitKey()

Given an index of choices for each column, construct a 1D array from a 2D array

I have a 2D array such as:
julia> m = [1 2 3 4 5
6 7 8 9 10
11 12 13 14 15]
3×5 Array{Int64,2}:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
I want to pick one value from each column and construct a 1D array.
So for instance, if my choices are
julia> choices = [1, 2, 3, 2, 1]
5-element Array{Int64,1}:
1
2
3
2
1
Then the desired output is [1, 7, 13, 9, 5]. What's the best way to do that? In my particular application, I am randomly generating these values, e.g.
choices = rand(1:size(m)[1], size(m)[2])
Thank you!
This is probably the simplest approach:
[m[c, i] for (i, c) in enumerate(choices)]
EDIT:
If best means fastest for you such a function should be approximately 2x faster than the comprehension for large m:
function selector(m, choices)
v = similar(m, size(m, 2))
for i in eachindex(choices)
#inbounds v[i] = m[choices[i], i]
end
v
end

how to separate matrices in an array?

I want to apply a function for each four matrices, for example start from 1:4 then 5:8 then 9:12 ;13:16,17:20,21:24 and so on in my real data
k = 24; n=3; m = 4
ary=array(1:24, c(n,m,k))
str(ary)
int [1:3, 1:4, 1:24] 1 2 3 4 5 6 7 8 9 10 ...
for each four matrices in ary fun {.........}
If you want to use a for-loop as suggested in the question, just do the following:
Seq <- seq(1, 24, 4)
for (i in Seq){
## i is 1, 5, 9, 13, 17, 21
ary[ , , i:(i+3)] #get's you the array with just four matrices
# do something ...
}

understand functions that operate on whole array in groupby aggregation

import numpy as np
import pandas as pd
df = pd.DataFrame({
'clients': pd.Series(['A', 'A', 'A', 'B', 'B']),
'odd1': pd.Series([1, 1, 2, 1, 2]),
'odd2': pd.Series([6, 7, 8, 9, 10])})
grpd = df.groupby(['clients', 'odd1']).agg({
'odd2': lambda x: x/float(x.sum())
})
print grpd
The desired result is:
A 1 0.619047619
2 0.380952381
B 1 0.473684211
2 0.526316
I have browsed around but I still don't understand how having lambdas that operate on the whole array, f.ex. x.sum() work. Furthermore, I still miss the point on what x is in x.sum() wrt to the grouped columns.
You can do:
>>> df.groupby(['clients', 'odd1'])['odd2'].sum() / df.groupby('clients')['odd2'].sum()
clients odd1
A 1 0.619
2 0.381
B 1 0.474
2 0.526
Name: odd2, dtype: float64
or alternatively, use .transform to obtain values based on clients grouping and then sum for each clients and odd1 grouping:
>>> df['val'] = df['odd2'] / df.groupby('clients')['odd2'].transform('sum')
>>> df
clients odd1 odd2 val
0 A 1 6 0.286
1 A 1 7 0.333
2 A 2 8 0.381
3 B 1 9 0.474
4 B 2 10 0.526
>>> df.groupby(['clients', 'odd1'])['val'].sum()
clients odd1
A 1 0.619
2 0.381
B 1 0.474
2 0.526
Name: val, dtype: float64

storing value against variable name "QW1I5K20" in an array element Q[1,5,20] using R

I have an excel file (.csv) with a sorted column of variable names such as "QW1I1K5" and numerical values against them.
this list goes on for
W from 1 to 15
I from 1 to 4
K from 1 to 30
total elements = 15*4*30 = 1800
I want to store the numerical values against these variables in an array whose indices are derived from the variable name .
for example QW1I1K5 has a value 11 . this must be stored in an array element Q[1,1,5] = 11 ( index set of [1,1,5] corresponds to W1 , I1 , K5)
May be this helps
Q <- array(dat$Col2, dim=c(15,4,30))
dat$Col2[dat$Col1=='QW1I1K5']
#[1] 34
Q[1,1,5]
#[1] 34
dat$Col2[dat$Col1=='QW4I3K8']
#[1] 38
Q[4,3,8]
#[1] 38
If you want the index along with the values
library(reshape2)
d1 <- melt(Q)
head(d1,3)
# Var1 Var2 Var3 value
#1 1 1 1 12
#2 2 1 1 9
#3 3 1 1 29
Q[1,1,1]
#[1] 12
Q[3,1,1]
#[1] 29
Update
Suppose, your data is in the order as you described in the comments, which will be dat1
indx <- read.table(text=gsub('[^0-9]+', ' ', dat1$Col1), header=FALSE)
dat2 <- dat1[do.call(order, indx[,3:1]),]
Q1 <- array(dat2$Col2,dim=c(15,4,30))
Q1[1,1,2]
#[1] 20
dat2$Col2[dat2$Col1=='QW1I1K2']
#[1] 20
data
Col1 <- do.call(paste,c(expand.grid('QW', 1:15, 'I', 1:4, 'K',1:30),
list(sep='')))
set.seed(24)
dat <- data.frame(Col1, Col2=sample(1:40, 1800,replace=TRUE))
dat1 <- dat[order(as.numeric(gsub('[^0-9]+', '', dat$Col1))),]
row.names(dat1) <- NULL
I would suggest looking at using "data.table" and setting your key to the split columns. You can use cSplit from my "splitstackshape" function to easily split the column.
Sample Data:
df <- data.frame(
V1 = c("QW1I1K1", "QW1I1K2", "QW1I1K3",
"QW1I1K4", "QW2I1K5", "QW2I3K2"),
V2 = c(15, 20, 5, 6, 7, 9))
df
# V1 V2
# 1 QW1I1K1 15
# 2 QW1I1K2 20
# 3 QW1I1K3 5
# 4 QW1I1K4 6
# 5 QW2I1K5 7
# 6 QW2I3K2 9
Splitting the column:
library(splitstackshape)
out <- cSplit(df, "V1", "[A-Z]+", fixed = FALSE)
setnames(out, c("V2", "W", "I", "K"))
setcolorder(out, c("W", "I", "K", "V2"))
setkey(out, W, I, K)
out
# W I K V2
# 1: 1 1 1 15
# 2: 1 1 2 20
# 3: 1 1 3 5
# 4: 1 1 4 6
# 5: 2 1 5 7
# 6: 2 3 2 9
Extracting rows:
out[J(1, 1, 4)]
# W I K V2
# 1: 1 1 4 6
out[J(2, 3, 2)]
# W I K V2
# 1: 2 3 2 9

Resources