(Disclaimer: I've simplified my problem to the salient points, what I want to do is slightly more complicated but I describe the core issue here.)
I am trying to build a network using keras to learn properties of some 5 by 5 matrices.
The input data is in the form of a 1000 by 5 by 5 numpy array, where each 5 by 5 sub-array represents a single matrix.
What I want the network to do is to use the properties of each row in the matrix, so I would like to split each 5 by 5 array into individual 1 by 5 arrays and pass each of these 5 arrays on to the next part of the network.
Here is what I have so far:
input_mat = keras.Input(shape=(5,5), name='Input')
part_list = list()
for i in range(5):
part_list.append(keras.layers.Lambda(lambda x: x[i,:])(input_mat))
dense_list = list()
for i in range(5):
dense_list.append( keras.layers.Dense(10, activation='selu',
use_bias=True)(part_list[i]) )
conc = keras.layers.Concatenate(axis=-1, name='Concatenate')(dense_list)
dense_out = keras.layers.Dense(1, name='D_out', activation='sigmoid')(conc)
model = keras.Model(inputs= input_mat, outputs=dense_out)
model.compile(optimizer='adam', loss='mean_squared_error')
My problem is that this does not appear to train well, and looking at the model summary I am not sure that the network is splitting the inputs as I would like:
Layer (type) Output Shape Param # Connected to
==================================================================================================
Input (InputLayer) (None, 5, 5) 0
__________________________________________________________________________________________________
lambda_5 (Lambda) (5, 5) 0 Input[0][0]
__________________________________________________________________________________________________
lambda_6 (Lambda) (5, 5) 0 Input[0][0]
__________________________________________________________________________________________________
lambda_7 (Lambda) (5, 5) 0 Input[0][0]
__________________________________________________________________________________________________
lambda_8 (Lambda) (5, 5) 0 Input[0][0]
__________________________________________________________________________________________________
lambda_9 (Lambda) (5, 5) 0 Input[0][0]
__________________________________________________________________________________________________
dense (Dense) (5, 10) 60 lambda_5[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (5, 10) 60 lambda_6[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (5, 10) 60 lambda_7[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) (5, 10) 60 lambda_8[0][0]
__________________________________________________________________________________________________
dense_4 (Dense) (5, 10) 60 lambda_9[0][0]
__________________________________________________________________________________________________
Concatenate (Concatenate) (5, 50) 0 dense[0][0]
dense_1[0][0]
dense_2[0][0]
dense_3[0][0]
dense_4[0][0]
__________________________________________________________________________________________________
D_out (Dense) (5, 1) 51 Concatenate[0][0]
==================================================================================================
Total params: 351
Trainable params: 351
Non-trainable params: 0
The input and output nodes of the Lambda layers look wrong to me, though I'm afraid I'm still struggling to understand the concept.
Lambda are to be avoided.
Subclass layer instead:
class Slice(keras.layers.Layer):
def __init__(self, begin, size,**kwargs):
super(Slice, self).__init__(**kwargs)
self.begin = begin
self.size = size
def get_config(self):
config = super().get_config().copy()
config.update({
'begin': self.begin,
'size': self.size,
})
return config
def call(self, inputs):
return tf.slice(inputs, self.begin, self.size)
In the line
part_list.append(keras.layers.Lambda(lambda x: x[i,:])(input_mat))
You are basically taking the first 5 of the 1000 images, which is not what you want to do.
To achieve what you want, try tensorflow's unstack operation:
part_list = tf.unstack(input_mat, axis=1)
This should give you a list having 5 elements, each element having shape [1000, 5]
Related
How can I create this kind of array in R?
iii <- seq(from = 1, to = 49, by = 2)
this only creates values:
1 3 5 .. 49
The array that I need to create:
1, 0, 3, 0, 5, 0, 7, . . . , 0, 49
Using:
x <- 1:11
x * (x %% 2)
gives:
[1] 1 0 3 0 5 0 7 0 9 0 11
What this does:
x %% 2 creates a vector of one's for the uneven values of x and zero's for the even values of x.
Multiplying x with x %% 2 thus gives a vector with uneven values with zero's in between.
Based the suggestion of #lmo, you could also do:
x <- seq(1, 11, 2)
head(rep(x, each = 2) * (1:0), -1)
which will give the same result.
I have a pivot table array with factors and X and Y coordinates such as the one below, and I have a look up table with 64 colours that have RGB values. I have assigned a colour to each factor combination using a dictionary of tuples, but I am having a hard time figuring out how to now compare the keys of my dictonary (which are the different combination of factors) to my array so that each row that has that factor combination can be assigned the colour given in the dictionary.
This is an example of the Pivot Table:
A B C D Xpoint Ypoint
0 1 0 0 20 20
0 1 1 0 30 30
0 1 0 0 40 40
1 0 1 0 50 50
1 0 1 0 60 60
EDIT: This is an example of the LUT:
R G B
0 0 0
1 0 103
0 21 68
95 173 58
and this is an example of the dictionary that was made:
{
(0, 1, 0, 0): (1, 0, 103),
(0, 1, 1, 0): (12, 76, 161),
(1, 0, 1, 0): (0, 0, 0)
}
This is the code that I have used:
import numpy as np
from PIL import Image, ImageDraw
## load in LUT of 64 colours ##
LUT = np.loadtxt('LUT64.csv', skiprows=1, delimiter=',')
print LUT
## load in XY COordinates ##
PivotTable = np.loadtxt('PivotTable_2017-07-13_001.txt', skiprows=1, delimiter='\t')
print PivotTable
## Bring in image ##
IM = Image.open("mothTest.tif")
#bring in number of factors
numFactors = 4
#assign colour vectors to factor combos
iterColours = iter(LUT)
colour_dict = dict() # size will tell you how many colours will be used
for entry in PivotTable:
key = tuple(entry[0:numBiomarkers])
if key not in colour_dict:
colour_dict[key] = next(iterColours)
print(colour_dict)
Is there a way to compare the tuples in this dictionary to the rows in the pivot table array, or maybe there is a better way of doing this? Any help would be greatly appreciated!
If your target is, as I suppose in my comment above, to trace back the colors to the ntuple, then you already did everything. But I do not catch which role is played by the tif file ... Please note I corrected the reference to the non-existent NumBiomarkers variable...
import numpy as np
from PIL import Image, ImageDraw
## load in LUT of 64 colours ##
LUT = np.loadtxt('LUT64.csv', skiprows=1, delimiter=',')
print LUT
## load in XY COordinates ##
PivotTable = np.loadtxt('PivotTable_2017-07-13_001.txt', skiprows=1, delimiter=',')
print PivotTable
## Bring in image ##
IM = Image.open("Lenna.tif")
#bring in number of factors
numFactors = 4
#assign colour vectors to factor combos
iterColours = iter(LUT)
colour_dict = dict() # size will tell you how many colours will be used
for entry in PivotTable:
key = tuple(entry[0:numFactors])
if key not in colour_dict:
colour_dict[key] = next(iterColours)
print(colour_dict)
print '===='
for entry in PivotTable:
key = tuple(entry[0:numFactors])
print str(entry) + ' ' + str(colour_dict[key])
can you please add a short example for LUT64.csv, for PivotTable_2017-07-13_001.txt ? Maybe for this one you should also use a different delimiter than \t to ensure portability of your examples.
Regards
I am trying to have a numpy array with random numbers from 0 to 1:
import numpy as np
x = np.random.random((3,3))
yields
[[ 0.11874238 0.71885484 0.33656161]
[ 0.69432263 0.25234083 0.66118676]
[ 0.77542651 0.71230397 0.76212491]]
And, from this array, I need the row,column combinations which have values bigger than 0.3. So the expected output should look like:
(0,1),(0,2),(1,0),(1,2),(2,0),(2,1),(2,2)
To be able to extract the item (the values of x[row][column]),and tried to write the output to a file. I tried the following command:
with open('newfile.txt', 'w') as fd:
for row in x:
for item in row:
if item > 0.3:
print(item)
for row in item:
for col in item:
print(row,column,'\n')
fd.write(row,column,'\n')
However, it raises an error :
TypeError: 'numpy.float64' object is not iterable
Also, I searched but could not find how to start the numpy index from 1 instead of 0. For example, the expected output would look like this:
(1,2),(1,3),(2,1),(2,3),(3,1),(3,2),(3,3)
Do you know how to get these outputs?
Get the indices along first two axes that match that criteria with np.nonzero/np.where on the mask of comparisons and then simply index with integer array indexing -
r,c = np.nonzero(x>0.3)
out = x[r,c]
If you are looking to get those indices a list of tuples, zip those indices -
zip(r,c)
To get those starting from 1, add 1 and then zip -
zip(r+1,c+1)
On Python 3.x, you would need to wrap it with list() : list(zip(r,c)) and list(zip(r+1,c+1)).
Sample run -
In [9]: x
Out[9]:
array([[ 0.11874238, 0.71885484, 0.33656161],
[ 0.69432263, 0.25234083, 0.66118676],
[ 0.77542651, 0.71230397, 0.76212491]])
In [10]: r,c = np.nonzero(x>0.3)
In [14]: zip(r,c)
Out[14]: [(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1), (2, 2)]
In [18]: zip(r+1,c+1)
Out[18]: [(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2), (3, 3)]
In [13]: x[r,c]
Out[13]:
array([ 0.71885484, 0.33656161, 0.69432263, 0.66118676, 0.77542651,
0.71230397, 0.76212491])
Writing indices to file -
Use np.savetxt with int format, like so -
In [69]: np.savetxt("output.txt", np.argwhere(x>0.3), fmt="%d", comments='')
In [70]: !cat output.txt
0 1
0 2
1 0
1 2
2 0
2 1
2 2
With the 1 based indexing, add 1 to np.argwhere output -
In [71]: np.savetxt("output.txt", np.argwhere(x>0.3)+1, fmt="%d", comments='')
In [72]: !cat output.txt
1 2
1 3
2 1
2 3
3 1
3 2
3 3
You could use np.where, which returns two arrays (when applied to a 2D array), with indices of rows (and corresponding columns) satisfy the condition you specifiy as an argument.
Then you can zip these two arrays to get back a list of tuples:
list(zip(*np.where(x > 0.3)))
If you want to add 1 to every element of every tuple (use 1 based indexing), either loop over the tuples, either add 1 to each array returned by where:
res = np.where(x > 0.3)
res[0] += 1 # adds one to every element of res[0] thanks to broadcasting
res[1] += 1
list(zip(*res))
import numpy as np
import pandas as pd
df = pd.DataFrame({
'clients': pd.Series(['A', 'A', 'A', 'B', 'B']),
'odd1': pd.Series([1, 1, 2, 1, 2]),
'odd2': pd.Series([6, 7, 8, 9, 10])})
grpd = df.groupby(['clients', 'odd1']).agg({
'odd2': lambda x: x/float(x.sum())
})
print grpd
The desired result is:
A 1 0.619047619
2 0.380952381
B 1 0.473684211
2 0.526316
I have browsed around but I still don't understand how having lambdas that operate on the whole array, f.ex. x.sum() work. Furthermore, I still miss the point on what x is in x.sum() wrt to the grouped columns.
You can do:
>>> df.groupby(['clients', 'odd1'])['odd2'].sum() / df.groupby('clients')['odd2'].sum()
clients odd1
A 1 0.619
2 0.381
B 1 0.474
2 0.526
Name: odd2, dtype: float64
or alternatively, use .transform to obtain values based on clients grouping and then sum for each clients and odd1 grouping:
>>> df['val'] = df['odd2'] / df.groupby('clients')['odd2'].transform('sum')
>>> df
clients odd1 odd2 val
0 A 1 6 0.286
1 A 1 7 0.333
2 A 2 8 0.381
3 B 1 9 0.474
4 B 2 10 0.526
>>> df.groupby(['clients', 'odd1'])['val'].sum()
clients odd1
A 1 0.619
2 0.381
B 1 0.474
2 0.526
Name: val, dtype: float64
A web application can send to a function an array of arrays like
[
[
[1,2],
[3,4]
],
[
[],
[4,5,6]
]
]
The outer array length is n > 0. The middle arrays are of constant length, 2 in this example. And the inner arrays lengths are n >= 0.
I could string build it like this:
with t(a, b) as (
values (1, 4), (2, 3), (1, 4), (7, 3), (7, 4)
)
select distinct a, b
from t
where
(a = any(array[1,2]) or array_length(array[1,2],1) is null)
and
(b = any(array[3,4]) or array_length(array[3,4],1) is null)
or
(a = any(array[]::int[]) or array_length(array[]::int[],1) is null)
and
(b = any(array[4,5,6]) or array_length(array[4,5,6],1) is null)
;
a | b
---+---
7 | 4
1 | 4
2 | 3
But I think I can do better like this
with t(a, b) as (
values (1, 4), (2, 3), (1, 4), (7, 3), (7, 4)
), u as (
select unnest(a)::text[] as a
from (values
(
array[
'{"{1,2}", "{3,4}"}',
'{"{}", "{4,5,6}"}'
]::text[]
)
) s(a)
), s as (
select a[1]::int[] as a1, a[2]::int[] as a2
from u
)
select distinct a, b
from
t
inner join
s on
(a = any(a1) or array_length(a1, 1) is null)
and
(b = any(a2) or array_length(a2, 1) is null)
;
a | b
---+---
7 | 4
2 | 3
1 | 4
Notice that a text array was passed and then "casted" inside the function. That was necessary as Postgresql can only deal with arrays of matched dimensions and the passed inner arrays can vary in dimension. I could "fix" them before passing by adding some special value like zero to make them all the same length of the longest one but I think it is cleaner to deal with that inside the function.
Am I missing something? Is it the best approach?
I like your second approach.
SELECT DISTINCT t.*
FROM (VALUES (1, 4), (5, 1), (2, 3), (1, 4), (7, 3), (7, 4)) AS t(a, b)
JOIN (
SELECT arr[1]::int[] AS a1
,arr[2]::int[] AS b1
FROM (
SELECT unnest(ARRAY['{"{1,2}", "{3,4}"}'
,'{"{}" , "{4,5,6}"}'
,'{"{5}" , "{}"}' -- added element to 1st dimension
])::text[] AS arr -- 1d text array
) sub
) s ON (a = ANY(a1) OR a1 = '{}')
AND (b = ANY(b1) OR b1 = '{}')
;
Suggesting only minor improvements:
Subqueries instead of CTEs for slightly better performance.
Simplified test for empty array: checking against literal '{}' instead of function call.
One less subquery level for unwrapping the array.
Result:
a | b
--+---
2 | 3
7 | 4
1 | 4
5 | 1
For the casual reader: Wrapping the multi-dimensional array of integer is necessary, since Postgres demands that (quoting error message):
multidimensional arrays must have array expressions with matching dimensions
An alternate route would be with a 2-dimensional text array and unnest it using generate_subscripts():
WITH a(arr) AS (SELECT '{{"{1,2}", "{3,4}"}
,{"{}", "{4,5,6}"}
,{"{5}", "{}"}}'::text[] -- 2d text array
)
SELECT DISTINCT t.*
FROM (VALUES (1, 4), (5, 1), (2, 3), (1, 4), (7, 3), (7, 4)) AS t(a, b)
JOIN (
SELECT arr[i][1]::int[] AS a1
,arr[i][2]::int[] AS b1
FROM a, generate_subscripts(a.arr, 1) i -- using implicit LATERAL
) s ON (t.a = ANY(s.a1) OR s.a1 = '{}')
AND (t.b = ANY(s.b1) OR s.b1 = '{}');
Might be faster, can you test?
In versions before 9.3 one would use an explicit CROSS JOIN instead of lateral cross joining.