I want to reorganize two-dimensional list elements in a list (here two elements):
[[['A','B','C'],
['G','H','I']],
[['D','E','F'],
['J','K','L']]]
to become:
[['A','B','C','D','E','F'],
['G','H','I','J','K','L']]
Is there a better way to write this, than the one expressed by the following function?
def joinTableColumns(tableColumns):
"""
fun([[['A','B','C'],
['G','H','I'] ],
[['D','E','F'],
['J', 'K', 'L']]]) --> [['A', 'B', 'C', 'D', 'E', 'F'],
['G', 'H', 'I', 'J', 'K', 'L']]
"""
tableData = []
for i,tcol in enumerate(tableColumns):
for j,line in enumerate(tcol):
if i == 0:
tableData.append(line)
else:
tableData[j]+=line
return tableData
Considering, that the number of rows to join is equal:
tdim_test = [(len(x), [len(y) for y in x][0] )for x in tableData]
len(list(set([x[0] for x in tdim_test])))==1
How can I increase robustness of that function? Or, is there something from a standard library that I should use instead?
Yes, you can use zip() function and itertools.chain() within a list comprehension:
In [17]: lst = [[['A','B','C'],
['G','H','I']],
[['D','E','F'],
['J','K','L']]]
In [18]: from itertools import chain
In [19]: [list(chain.from_iterable(i)) for i in zip(*lst)]
Out[19]: [['A', 'B', 'C', 'D', 'E', 'F'], ['G', 'H', 'I', 'J', 'K', 'L']]
Or as a pure functional approach you can use itertools.starmap() and operator.add():
In [22]: from itertools import starmap
In [23]: from operator import add
In [24]: list(starmap(add, zip(*lst)))
Out[24]: [['A', 'B', 'C', 'D', 'E', 'F'], ['G', 'H', 'I', 'J', 'K', 'L']]
import functools
[ functools.reduce(lambda x,y: x + y, i,[]) for i in zip(*matrix)]
This will give you what you want
You could just use the zip function, unpacking the table inside it and add the pairs:
table = [[['A','B','C'], ['G','H','I']],
[['D','E','F'], ['J','K','L']]]
res = [t1 + t2 for t1, t2 in zip(*table)]
which yields your wanted result:
[['A', 'B', 'C', 'D', 'E', 'F'], ['G', 'H', 'I', 'J', 'K', 'L']]
Related
As the title says I'd like to know the maximum length the cursorMark can have that I receive from Solr.
It would also be nice to get some info about chars that can be in it. But just the max length would already be nice. Does it even have one or can it theoretically grow without a limit?
Regarding the Set of Characters:
Looking at the Solr CursorMark source code, we can see that the representation of the cursor mark is a Base64 encoded String.
The specific implementation of Base64 used here is in Solr's Base64 utility class. Here we can see their character set is:
private static final char intToBase64[] = {
'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '+', '/'
};
There may also be = symbols if strings are padded. But I don't recall seeing those.
Regarding the Length:
The size will vary depending on the specific data being encoded (sufficient to identify a sort spec/position).
So, based on that, I only have anecdotal observation, which is that the order of magnitude is bytes, not kilobytes.
Final note: This is all behind-the-scenes stuff - and, as such, may be subject to change without warning.
I have been trying to iterate through an array.
below is the code.
x = ['lemon', 'tea', 'water', ]
def randomShuffle (arr,n):
from random import choices
newList=[]
for item in arr:
r=choices(arr, k=n)
if r.count(item) <= 2:
newList.append(item)
return (newList)
i would like to know the logic for writing it please.
thank you all
Use a while loop: if every item is to appear twice, then teh resulting array should be twice the length of the input one.
And of course check not to add the same item more than twice in the result ;)
Choices return a list of size 1, so I use [0] to get the element
xx = ["a", "b", "c"]
def my_function(x):
res = []
while len(res) < len(x) * 2:
c = choices(x)[0]
if res.count(c) < 2:
res.append(c)
return res
my_function(xx)
> ['c', 'c', 'a', 'b', 'a', 'b']
my_function(xx)
> ['a', 'b', 'b', 'a', 'c', 'c']
I have an array in numpy which looks like this:
myarray = ['a', 'b', 'c', 'd', 'e', 'f']
I would like to return an array of indices for 'b', 'c', 'd' which looks like this:
myind = [1,2,3]
I need this indices array later to use it in a loop. I am using Python 2.7. Thanks folks
You can use np.searchsorted -
In [61]: myarray = np.array(['a', 'b', 'c', 'd', 'e', 'f'])
In [62]: search = np.array(['b', 'c', 'd'])
In [63]: np.searchsorted(myarray, search)
Out[63]: array([1, 2, 3])
If myarray is not alphabetically sorted, we need to use the additional argument sorter with it, like so -
In [64]: myarray = np.array(['a', 'd', 'b', 'e', 'c', 'f'])
In [65]: search = np.array(['b', 'c', 'd'])
In [67]: sidx = np.argsort(myarray)
In [69]: sidx[np.searchsorted(myarray, search, sorter=sidx)]
Out[69]: array([2, 4, 1])
If your array does not contain any duplicates then np.searchsorted should do the trick. if your array contains duplicates then you have to use np.argwhere
Examples:
input_array = np.array(['a','b','c','d','e','f','a'])
search = np.array(['a','b','c'])
np.searchsorted(input_array, search)
output >> array([0, 1, 2])
np.argwhere(input_array == 'a')
output >> array([[0],[6]])
For a more general solution you can do
np.concatenate( (np.argwhere(input_array == 'a') ,
np.argwhere(input_array == 'b'),
np.argwhere(input_array == 'c') ) )
output >> array([[0],[6],[1],[2]])
The following code works but does multiple passes over the entire array, which I would like to avoid. Another alternative would have been to sort the named_coords array by name and then gather the pieces while iterating through the sorted array, but I didn't find a clean way to make that work. Ideally the answer would use standard adapters and such to transform the collection as a whole.
use std::collections::HashMap;
fn main() {
let p = [ ['I', 'P', 'P', 'Y', 'Y', 'Y', 'Y', 'V', 'V', 'V']
, ['I', 'P', 'P', 'X', 'Y', 'L', 'L', 'L', 'L', 'V']
, ['I', 'P', 'X', 'X', 'X', 'F', 'Z', 'Z', 'L', 'V']
, ['I', 'T', 'W', 'X', 'F', 'F', 'F', 'Z', 'U', 'U']
, ['I', 'T', 'W', 'W', 'N', 'N', 'F', 'Z', 'Z', 'U']
, ['T', 'T', 'T', 'W', 'W', 'N', 'N', 'N', 'U', 'U']
];
// Gather named coordinates into a Vec
let mut named_coords = Vec::new();
for (n0, j0) in p.iter().enumerate() {
for (n1, j1) in j0.iter().enumerate() {
named_coords.push(((n0, n1), *j1));
}
}
// Transform the named coordinates into Vector of names.
let mut names = named_coords.iter().map(|x| x.1).collect::<Vec<_>>();
names.sort();
names.dedup();
// Filter the named coordinates by name and collect results.
// Inefficient - iterates over entire named_coords vector multiple times.
let mut pieces = HashMap::new();
for name in names {
pieces.insert(name, named_coords.iter().filter(|&p| p.1 == name).map(|p| p.0).collect::<Vec<_>>());
}
// Print out results.
for n in pieces.iter() {
for coord in n.1.iter() {
println!("{} {} {}", n.0, coord.0, coord.1);
}
}
}
Use the entry API:
use std::collections::HashMap;
fn main() {
let p = [['I', 'P', 'P', 'Y', 'Y', 'Y', 'Y', 'V', 'V', 'V'],
['I', 'P', 'P', 'X', 'Y', 'L', 'L', 'L', 'L', 'V'],
['I', 'P', 'X', 'X', 'X', 'F', 'Z', 'Z', 'L', 'V'],
['I', 'T', 'W', 'X', 'F', 'F', 'F', 'Z', 'U', 'U'],
['I', 'T', 'W', 'W', 'N', 'N', 'F', 'Z', 'Z', 'U'],
['T', 'T', 'T', 'W', 'W', 'N', 'N', 'N', 'U', 'U']];
let mut pieces = HashMap::new();
for (n0, j0) in p.iter().enumerate() {
for (n1, j1) in j0.iter().enumerate() {
pieces.entry(j1).or_insert_with(Vec::new).push((n0, n1));
}
}
println!("{:?}", pieces);
}
Efficient: A single pass through the data and a single hash lookup per item.
Simple: beauty is in the eye of the beholder.
I’m trying to get the list of an object from aws s3 bucket using boto. This list is made out of common elements of two different list. I want this list to be sorted by "last_modified" of an object by ascending order, from S3 bucket. Meaning, I want the old object (based on the date) to be first on my list. So, I am trying to prepare list of an 5 elements like this. I want to take this list and process only those files that belong to this list and eventually delete those files and pickup the next list of 5 elements same way.
Here is the bucket hierarchy:-
//ship-my-data/outputs/444556677788.tar.gz
//ship-my-data/outputs/444556677788.tar.gz
//ship-my-data/outputs/345345345353.tar.gz
//ship-my-data/outputs1/ctrlFiles/ 444556677788.ctrl.tar.gz
//ship-my-data/outputs1/ctrlFiles/ 123222333444.ctrl.tar.gz
//ship-my-data/outputs1/ctrlFiles/ 769797977979.ctrl.tar.gz
I want to make a list of common elements from both the folder above i.e. from outputs1 & ctrlFiles folder.
Here is my code:
bucket = LogShip._aws_connection.get_bucket(aws_bucket_to_download) #Connecting to AWS s3 bucket
bucket_list_ctrl = bucket.list(prefix='outputs/ctrlFiles/', delimiter='/') #get the bucket list for control files.
ctrl_list = sorted(bucket_list_ctrl, key=lambda item1: item1.last_modified) # sort the list by last_modified date.
bucket_list_tar = bucket.list(prefix='outputs/', delimiter='/') #get the list for tar files.
tar_list = sorted(bucket_list_tar, key=lambda item2: item2.last_modified) #suppose to get the bucket list, but throwing an error #AttributeError: 'Prefix' object has no attribute 'last_modified'""
for item_c in ctrl_list:
ctrlName = str(item_c.name).split("/")[2].replace(".ctrl.tar.gz","") # cotrol file name: 1444447203130120001
for item_t in bucket_list_tar:
tarName = str(item_t.name).split("/")[1].replace(".tar.gz","") #tar file name: 1444447203130120001
#now from above two lists I want to prepare a master list of an common elements which is pick up only 5 elements to proceed further.
j = 5
while j <= 5:
for elem in ctrlName:
for elem in tarName:
master_list.append(elem)
j=j+1
print master_list
Output:
['c', 't', 'r', 'l', 'F', 'i', 'l', 'e', 's', 'c', 't', 'r', 'l', 'F', 'i', 'l', 'e', 's', 'c', 't', 'r', 'l', 'F', 'i', 'l', 'e', 's', 'c', 't', 'r', 'l', 'F', 'i', 'l', 'e', 's', 'c', 't', 'r', 'l', 'F', 'i', 'l', 'e', 's', 'c', 't', 'r', 'l', 'F']
Expected output:
[444556677788, 123222333444]
Can anyone please help me understand where I'm making mistake?
I'm not sure why you want to do things in groups of five, so this code matches all files at once:
import boto
import re
conn = boto.connect_s3('REGION')
bucket = conn.get_bucket('BUCKETNAME')
list = bucket.list()
# Get two lists of files
bucket_list_ctrl = bucket.list(prefix='outputs/ctrlFiles/', delimiter='/')
bucket_list_tar = bucket.list(prefix='outputs/', delimiter='/')
# Extract filenames and modified date
pattern = re.compile('.*?(\d+).*?')
ctrl_files = [(pattern.match(obj.name).group(1), obj.last_modified) for obj in bucket_list_ctrl]
list_files = [pattern.match(obj.name).group(1) for obj in bucket_list_tar if obj.name.endswith('gz')]
# Find filenames that match both
both = [obj for obj in ctrl_files if obj[0] in list_files]
# Give sorted result
result = [f[0] for f in sorted(both, key=lambda obj: obj[1])]