Get indices of strings in an array - arrays

I have an array in numpy which looks like this:
myarray = ['a', 'b', 'c', 'd', 'e', 'f']
I would like to return an array of indices for 'b', 'c', 'd' which looks like this:
myind = [1,2,3]
I need this indices array later to use it in a loop. I am using Python 2.7. Thanks folks

You can use np.searchsorted -
In [61]: myarray = np.array(['a', 'b', 'c', 'd', 'e', 'f'])
In [62]: search = np.array(['b', 'c', 'd'])
In [63]: np.searchsorted(myarray, search)
Out[63]: array([1, 2, 3])
If myarray is not alphabetically sorted, we need to use the additional argument sorter with it, like so -
In [64]: myarray = np.array(['a', 'd', 'b', 'e', 'c', 'f'])
In [65]: search = np.array(['b', 'c', 'd'])
In [67]: sidx = np.argsort(myarray)
In [69]: sidx[np.searchsorted(myarray, search, sorter=sidx)]
Out[69]: array([2, 4, 1])

If your array does not contain any duplicates then np.searchsorted should do the trick. if your array contains duplicates then you have to use np.argwhere
Examples:
input_array = np.array(['a','b','c','d','e','f','a'])
search = np.array(['a','b','c'])
np.searchsorted(input_array, search)
output >> array([0, 1, 2])
np.argwhere(input_array == 'a')
output >> array([[0],[6]])
For a more general solution you can do
np.concatenate( (np.argwhere(input_array == 'a') ,
np.argwhere(input_array == 'b'),
np.argwhere(input_array == 'c') ) )
output >> array([[0],[6],[1],[2]])

Related

Python Iterating an Array

I have been trying to iterate through an array.
below is the code.
x = ['lemon', 'tea', 'water', ]
def randomShuffle (arr,n):
from random import choices
newList=[]
for item in arr:
r=choices(arr, k=n)
if r.count(item) <= 2:
newList.append(item)
return (newList)
i would like to know the logic for writing it please.
thank you all
Use a while loop: if every item is to appear twice, then teh resulting array should be twice the length of the input one.
And of course check not to add the same item more than twice in the result ;)
Choices return a list of size 1, so I use [0] to get the element
xx = ["a", "b", "c"]
def my_function(x):
res = []
while len(res) < len(x) * 2:
c = choices(x)[0]
if res.count(c) < 2:
res.append(c)
return res
my_function(xx)
> ['c', 'c', 'a', 'b', 'a', 'b']
my_function(xx)
> ['a', 'b', 'b', 'a', 'c', 'c']

A way to fix some columns of a numpy array to be integer and the rest as float? [duplicate]

I have two different arrays, one with strings and another with ints. I want to concatenate them, into one array where each column has the original datatype. My current solution for doing this (see below) converts the entire array into dtype = string, which seems very memory inefficient.
combined_array = np.concatenate((A, B), axis = 1)
Is it possible to mutiple dtypes in combined_array when A.dtype = string and B.dtype = int?
One approach might be to use a record array. The "columns" won't be like the columns of standard numpy arrays, but for most use cases, this is sufficient:
>>> a = numpy.array(['a', 'b', 'c', 'd', 'e'])
>>> b = numpy.arange(5)
>>> records = numpy.rec.fromarrays((a, b), names=('keys', 'data'))
>>> records
rec.array([('a', 0), ('b', 1), ('c', 2), ('d', 3), ('e', 4)],
dtype=[('keys', '|S1'), ('data', '<i8')])
>>> records['keys']
rec.array(['a', 'b', 'c', 'd', 'e'],
dtype='|S1')
>>> records['data']
array([0, 1, 2, 3, 4])
Note that you can also do something similar with a standard array by specifying the datatype of the array. This is known as a "structured array":
>>> arr = numpy.array([('a', 0), ('b', 1)],
dtype=([('keys', '|S1'), ('data', 'i8')]))
>>> arr
array([('a', 0), ('b', 1)],
dtype=[('keys', '|S1'), ('data', '<i8')])
The difference is that record arrays also allow attribute access to individual data fields. Standard structured arrays do not.
>>> records.keys
chararray(['a', 'b', 'c', 'd', 'e'],
dtype='|S1')
>>> arr.keys
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'keys'
A simple solution: convert your data to object 'O' type
z = np.zeros((2,2), dtype='U2')
o = np.ones((2,1), dtype='O')
np.hstack([o, z])
creates the array:
array([[1, '', ''],
[1, '', '']], dtype=object)
Refering Numpy doc, there is a function named numpy.lib.recfunctions.merge_arraysfunction which can be used to merge numpy arrays in different data type into either structured array or record array.
Example:
>>> from numpy.lib import recfunctions as rfn
>>> A = np.array([1, 2, 3])
>>> B = np.array(['a', 'b', 'c'])
>>> b = rfn.merge_arrays((A, B))
>>> b
array([(1, 'a'), (2, 'b'), (3, 'c')], dtype=[('f0', '<i4'), ('f1', '<U1')])
For more detail please refer the link above.

Best way to generate all combinations in array that contain certain element in it

I know that I can easily get all the combinations, but is there a way to only get the ones that contain certain element of the list? I'll give an example.
Lets say I have
arr = ['a','b','c','d']
I want to get all combinations with length (n) containing 'a', for example, if n = 3:
[a, b, c]
[a, b, d]
[a, c, d]
I want to know if there is a better way to get it without generating all combinations. Any help would be appreciated.
I would proceed as follow:
Remove 'a' from the array
Generate all combinations of 2 elements from the reduced array
For each combination, insert the 'a' in all three possible places
You can use combination of itertools and list comprehension. Like:
import itertools
import itertools
arr = ['a', 'b', 'c', 'd']
temp = itertools.combinations(arr, 3)
result = [list(i) for i in list(temp) if 'a' in i]
print(result)
output:
[['a', 'b', 'c'], ['a', 'b', 'd'], ['a', 'c', 'd']]

Concatenate rows of two dimensional list elements in a list

I want to reorganize two-dimensional list elements in a list (here two elements):
[[['A','B','C'],
['G','H','I']],
[['D','E','F'],
['J','K','L']]]
to become:
[['A','B','C','D','E','F'],
['G','H','I','J','K','L']]
Is there a better way to write this, than the one expressed by the following function?
def joinTableColumns(tableColumns):
"""
fun([[['A','B','C'],
['G','H','I'] ],
[['D','E','F'],
['J', 'K', 'L']]]) --> [['A', 'B', 'C', 'D', 'E', 'F'],
['G', 'H', 'I', 'J', 'K', 'L']]
"""
tableData = []
for i,tcol in enumerate(tableColumns):
for j,line in enumerate(tcol):
if i == 0:
tableData.append(line)
else:
tableData[j]+=line
return tableData
Considering, that the number of rows to join is equal:
tdim_test = [(len(x), [len(y) for y in x][0] )for x in tableData]
len(list(set([x[0] for x in tdim_test])))==1
How can I increase robustness of that function? Or, is there something from a standard library that I should use instead?
Yes, you can use zip() function and itertools.chain() within a list comprehension:
In [17]: lst = [[['A','B','C'],
['G','H','I']],
[['D','E','F'],
['J','K','L']]]
In [18]: from itertools import chain
In [19]: [list(chain.from_iterable(i)) for i in zip(*lst)]
Out[19]: [['A', 'B', 'C', 'D', 'E', 'F'], ['G', 'H', 'I', 'J', 'K', 'L']]
Or as a pure functional approach you can use itertools.starmap() and operator.add():
In [22]: from itertools import starmap
In [23]: from operator import add
In [24]: list(starmap(add, zip(*lst)))
Out[24]: [['A', 'B', 'C', 'D', 'E', 'F'], ['G', 'H', 'I', 'J', 'K', 'L']]
import functools
[ functools.reduce(lambda x,y: x + y, i,[]) for i in zip(*matrix)]
This will give you what you want
You could just use the zip function, unpacking the table inside it and add the pairs:
table = [[['A','B','C'], ['G','H','I']],
[['D','E','F'], ['J','K','L']]]
res = [t1 + t2 for t1, t2 in zip(*table)]
which yields your wanted result:
[['A', 'B', 'C', 'D', 'E', 'F'], ['G', 'H', 'I', 'J', 'K', 'L']]

Matlab - How to compare values in a cell array?

I have a set of inputs and one output declared in a cell array like that:
A = {'a', 'f', 'c', 'b';
'b', 'f', 'c', 'a';
'a', 'f', 'b', 'c';
'c', 'f', 'b', 'a';
'c', 'f', 'a', 'b';
'b', 'f', 'a', 'c' }
where the first column is an output, and the rest are the inputs used, for each output.
I need to compare the values to reduce the calculation time.
So, the thing is, for equals outputs, I wanna know if the inputs are the same, a important remark.. the order of values desn't metter, so, when comparing f c b with f b c it is the same.
I need this because, acttualy, my data set is a 5040 x 7 cell array and I need to put them into a intorpolation function.
I thought in something like
if the value of the output column is equal to the another value of the same column, check if the value of inputs are all the same, using, ismember function.
But I can not arrive to a code that works.
Any help, please?
First, since you don't care about the order of the inputs, I would sort each of the rows:
[T, N] = size(A);
for t = 1:T
Asorted(t,1) = A(t,1);
Asorted(t,2:N) = sort(A(t,2:N));
end
Now you want to find all of the duplicate rows. A simple way to do this is first to convert to a character array, and use the unique function --
B = cell2mat(Asorted);
[C, ii, jj] = unique(B,'rows');
Now C contains the unique rows of B, ii contains the indexes of the unique rows, and jj labels each of the rows of B depending on which unique value it has.
If you wanted to filter out all of the duplicate rows from A, you can now do
Afiltered = A(ii, :);
This results in:
Afiltered =
'a' 'f' 'b' 'c'
'b' 'f' 'a' 'c'
'c' 'f' 'a' 'b'

Resources