replace numpy elements with non-scalar dictionary values - arrays

import pandas as pd
import numpy as np
column = np.array([5505, 5505, 5505, 34565, 34565, 65539, 65539])
column = pd.Series(column)
myDict = column.groupby(by = column ).groups
I am creating a dictionary from a pandas df using df.group(by=..) which has the form:
>>> myDict
{5505: Int64Index([0, 1, 2], dtype='int64'), 65539: Int64Index([5, 6], dtype='int64'), 34565: Int64Index([3, 4], dtype='int64')}
I have a numpy array, e.g.
myArray = np.array([34565, 34565, 5505,65539])
and I want to replace each of the array's elements with the dictionary's values.
I have tried several solutions that I have found (e.g. here and here) but these examples have dictionaries with single dictionary values, and I am always getting the error of setting an array element with a sequence. How can I get over this problem?
My intended output is
np.array([3, 4, 3, 4, 0, 1, 2, 5, 6])

One approach based on np.searchsorted -
# Extract dict info
k = list(myDict.keys())
v = list(myDict.values())
# Use argsort of k to find search sorted indices from myArray in keys
# Index into the values of dict based on those indices for output
sidx = np.argsort(k)
idx = sidx[np.searchsorted(k,myArray,sorter=sidx)]
out_arr = np.concatenate([v[i] for i in idx])
Sample input, output -
In [369]: myDict
Out[369]:
{5505: Int64Index([0, 1, 2], dtype='int64'),
34565: Int64Index([3, 4], dtype='int64'),
65539: Int64Index([5, 6], dtype='int64')}
In [370]: myArray
Out[370]: array([34565, 34565, 5505, 65539])
In [371]: out_arr
Out[371]: array([3, 4, 3, 4, 0, 1, 2, 5, 6])

Related

Iterating over for an Array Column with dynamic size in Spark Scala Dataframe

I am familiar with this approach - case in point an example from How to obtain the average of an array-type column in scala-spark over all row entries per entry?
val array_size = 3
val avgAgg = for (i <- 0 to array_size -1) yield avg($"value".getItem(i))
df.select(array(avgAgg: _*).alias("avg_value")).show(false)
However, the 3 is hard-coded in reality.
No matter how hard I try not to use an UDF, I cannot do this type of thing dynamically based on the size of an array column already present in the data frame. E.g:
...
val z = for (i <- 1 to size($"sortedCol") ) yield array (element_at($"sortedCol._2", i), element_at($"sortedCol._3", i) )
...
...
.withColumn("Z", array(z: _*) )
I am looking as to how this can be done by applying to an existing array col which is variable in length. transform, expr? Not sure.
Full code as per request:
import spark.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window
case class abc(year: Int, month: Int, item: String, quantity: Int)
val df0 = Seq(abc(2019, 1, "TV", 8),
abc(2019, 7, "AC", 10),
abc(2018, 1, "TV", 2),
abc(2018, 2, "AC", 3),
abc(2019, 2, "CO", 10)).toDS()
val df1 = df0.toDF()
// Gen some data, can be done easier, but not the point.
val itemsList= collect_list(struct("month", "item", "quantity"))
// This nn works.
val nn = 3
val z = for (i <- 1 to nn) yield array (element_at($"sortedCol.item", i), element_at($"sortedCol.quantity", i) )
// But want this.
//val z = for (i <- 1 to size($"sortedCol") ) yield array (element_at($"sortedCol.item", i), element_at($"sortedCol.quantity", i) )
val df2 = df1.groupBy($"year")
.agg(itemsList as "items")
.withColumn("sortedCol", sort_array($"items", asc = true))
.withColumn("S", size($"sortedCol")) // cannot use this either
.withColumn("Z", array(z: _*) )
.drop("items")
.orderBy($"year".desc)
df2.show(false)
// Col Z is the output I want, but not the null value Array
UPD
In apache spark SQL, how to remove the duplicate rows when using collect_list in window function? there I solve with a very simple UDF but I was looking for a way without UDF and in particular the dynamic setting of the to value in the for loop. The answer proves that certain constructs are not possible - which was the verification being sort.
If I correctly understand your need, you can simply use transform function like this:
val df2 = df1.groupBy($"year")
.agg(itemsList as "items")
.withColumn("sortedCol", sort_array($"items", asc = true))
val transform_expr = "transform(sortedCol, x -> array(x.item, x.quantity))"
df2.withColumn("Z", expr(transform_expr)).show(false)
//+----+--------------------------------------+--------------------------------------+-----------------------------+
//|year|items |sortedCol |Z |
//+----+--------------------------------------+--------------------------------------+-----------------------------+
//|2018|[[1, TV, 2], [2, AC, 3]] |[[1, TV, 2], [2, AC, 3]] |[[TV, 2], [AC, 3]] |
//|2019|[[1, TV, 8], [7, AC, 10], [2, CO, 10]]|[[1, TV, 8], [2, CO, 10], [7, AC, 10]]|[[TV, 8], [CO, 10], [AC, 10]]|
//+----+--------------------------------------+--------------------------------------+-----------------------------+

how to convert a list of arrays to a python list

Given a number I want to split it separate but even-ish list which I can do :
import numpy as np
pages = 7
threads = 3
list_of_pages = range(1,pages+1)
page_list = [*np.array_split(list_of_pages, threads)]
Returns:
[array([1, 2, 3]), array([4, 5]), array([6, 7])]
I would like it to return a list of lists instead, ie:
[[1,2,3],[4,5],[6,7]]
I was hoping to do something like this (below doesnt work):
page_list = np.array[*np.array_split(list_of_pages, threads)].tolist()
is that possible or do I need to just loop through and convert it?
Assuming page_list is a list of ndarray, then you can convert to python list like this,
[x.tolist() for x in [*page_list]]
# [[1, 2, 3], [4, 5], [6, 7]]
#zihaozhihao gave a inline solution. You can also write a for loop to iterate over every item in page_list and convert it to list
list_of_lists = [ ]
for x in [*page_list]]:
list_of_lists.append(x.tolist())

How to insert value from an array into an other array in Ruby?

I have two arrays:
a = [a_first_element, a_second_element, a_third_element, a_fourth_element]
b = [b_first_element, b_second_element, b_third_element, b_fourth_element]
I would like to insert in the first array, at even positions, elements of the second array.
So the final array shoud look like :
[a_first_element, b_first_element, a_second_element, b_second_element, a_third_element,b_third_element, etc]
The arrays are made of the same number of items (around 30)
How could I do that ?
It looks like you want to zip the arrays together. Doing this:
a = [1, 2, 3, 4]
b = [111, 222, 333, 444]
c = a.zip(b)
will set c to:
[[1, 111], [2, 222], [3, 333], [4, 444]]
which is almost what you want, but you probably don't want the nested arrays. To get rid of the nested arrays, just call flatten:
c = a.zip(b).flatten()
Now c is set to:
[1, 111, 2, 222, 3, 333, 4, 444]

Splitting an Array into Sub-Arrays in Swift [duplicate]

This question already has answers here:
In Swift, an efficient function that separates an array into 2 arrays based on a predicate
(7 answers)
Closed 6 months ago.
Problem
Given an array of values how can I split it into sub-arrays made of elements that are equal?
Example
Given this array
let numbers = [1, 1, 1, 3, 3, 4]
I want this output
[[1,1,1], [3, 3], [4]]
What I am NOT looking for
A possible way of solving this would be creating some sort of index to indicate the occurrences of each element like this.
let indexes = [1:3, 3:2, 4:1]
And finally use the index to rebuild the output array.
let subsequences = indexes.sort { $0.0.0 < $0.1.0 }.reduce([Int]()) { (res, elm) -> [Int] in
return res + [Int](count: elm.1, repeatedValue: elm.0)
}
However with this solution I am losing the original values. Of course in this case it's not a big problem (an Int value is still and Inteven if recreated) but I would like to apply this solution to more complex data structures like this
struct Starship: Equatable {
let name: String
let warpSpeed: Int
}
func ==(left:Starship, right:Starship) -> Bool {
return left.warpSpeed == right.warpSpeed
}
Final considerations
The function I am looking for would be some kind of reverse of flatten(), infact
let subsequences: [[Int]] = [[1,1,1], [3, 3], [4]]
print(Array(subsequences.flatten())) // [1, 1, 1, 3, 3, 4]
I hope I made myself clear, let me know should you need further details.
// extract unique numbers using a set, then
// map sub-arrays of the original arrays with a filter on each distinct number
let numbers = [1, 1, 1, 3, 3, 4]
let numberGroups = Set(numbers).map{ value in return numbers.filter{$0==value} }
print(numberGroups)
[EDIT] changed to use Set Initializer as suggested by Hamish
[EDIT2] Swift 4 added an initializer to Dictionary that will do this more efficiently:
let numberGroups = Array(Dictionary(grouping:numbers){$0}.values)
For a list of objects to be grouped by one of their properties:
let objectGroups = Array(Dictionary(grouping:objects){$0.property}.values)
If you could use CocoaPods/Carthage/Swift Package Manager/etc. you could use packages like oisdk/SwiftSequence which provides the group() method:
numbers.lazy.group()
// should return a sequence that generates [1, 1, 1], [3, 3], [4].
or UsrNameu1/TraverSwift which provides groupBy:
groupBy(SequenceOf(numbers), ==)
If you don't want to add external dependencies, you could always write an algorithm like:
func group<S: SequenceType where S.Generator.Element: Equatable>(seq: S) -> [[S.Generator.Element]] {
var result: [[S.Generator.Element]] = []
var current: [S.Generator.Element] = []
for element in seq {
if current.isEmpty || element == current[0] {
current.append(element)
} else {
result.append(current)
current = [element]
}
}
result.append(current)
return result
}
group(numbers)
// returns [[1, 1, 1], [3, 3], [4]].
Let's assume that you have an unsorted array of items. You will need to sort the initial array then you will have something like this:
[1, 1, 1, 3, 3, 4]
After that you will initialize two arrays: one for storing arrays and another one to use it as a current array.
Loop through the initial array and:
if the current value isn't different from the last one, push it to the current array
otherwise push the current array to the first one then empty the current array.
Hope it helps!
Worth mentioning, using Swift Algorithms this is now a one-liner:
import Algorithms
let numbers = [1, 1, 1, 3, 3, 4]
let chunks: [[Int]] = numbers.chunked(by: ==).map { .init($0) }
print(chunks) // [[1, 1, 1], [3, 3], [4]]

Turn array into array of arrays following structure of another array

I would like to turn an array into an array of arrays following another array of arrays. I'm not sure how to do this, here are the arrays:
orig_array = [[0,1],[4],[3],[],[3,2,6],[]]
my_array = [2,0,1,3,3,4,5]
wanted_array = [[2,0],[1],[3],[],[3,4,5],[]]
I would like to keep the empty arrays.
Thanks
Get the lengths of each element in orig_array, perform cumumlative summations along the length values to give us the indices at which my_array needs to be split and finally use np.split to actually perform the splitting. Thus, the implementation would look something like this -
lens = [len(item) for item in orig_array]
out = np.split(my_array,np.cumsum(lens))[:-1]
Sample run -
In [72]: orig_array = np.array([[0,1],[4],[3],[],[3,2,6],[]])
...: my_array = np.array([2,0,1,3,3,4,5])
...:
In [73]: lens = [len(item) for item in orig_array]
...: out = np.split(my_array,np.cumsum(lens))[:-1]
...:
In [74]: out
Out[74]:
[array([2, 0]),
array([1]),
array([3]),
array([], dtype=int64),
array([3, 4, 5]),
array([], dtype=int64)]
def do(format, values):
if type(format) == list:
return [do(v, values) for v in format]
else:
return values.pop(0)
print do(orig_array, my_array)
Note: this destroys the array where the values come from.
You could do the following:
import copy
def reflect_array(orig_array, order):
wanted_array = copy.deepcopy(orig_array)
for i, part_list in enumerate(orig_array):
for j, _ in enumerate(part_list):
wanted_array[i][j] = order.pop()
return wanted_array
Test run:
orig_array = [[0,1],[4],[3],[],[3,2,6],[]]
my_array = [2,0,1,3,3,4,5]
print reflect_array(orig_array, my_array)
# [[2, 0], [1], [3], [], [3, 4, 5], []]
In [858]: my_array = [2,0,1,3,3,4,5]
In [859]: [[my_array.pop(0) for _ in range(len(x))] for x in orig_array]
Out[859]: [[2, 0], [1], [3], [], [3, 4, 5], []]
Use b=my_array[:] if you don't want to change my_array.
This operates on the same principle as #karoly's answer; just more direct because it assumes only one level of nesting.

Resources