Applying transform_lookup on datasets with different number of rows - maps

I am currently learning Altair's maps feature and while looking into one of the examples (https://altair-viz.github.io/gallery/airport_connections.html), I noticed that the datasets (airports.csv and flights-airport.csv) have different number of rows. Is it possible to apply transform_lookup even if that's the case?

Yes, it is possible to apply transform_lookup to datasets with different numbers of rows. The lookup transform amounts to a one-sided join based on a specified key colum: regardless of how many rows each dataset has, for each row of the main dataset, the first match in the lookup data is joined to the data.
A simple example to demonstrate this:
import altair as alt
import pandas as pd
df1 = pd.DataFrame({
'key': ['A', 'B', 'C'],
'x': [1, 2, 3]
})
df2 = pd.DataFrame({
'key': ['A', 'B', 'C', 'D'],
'y': [1, 2, 3, 4]
})
alt.Chart(df1).transform_lookup(
lookup='key',
from_=alt.LookupData(df2, key='key', fields=['y'])
).mark_bar().encode(
x='x:Q',
y='y:O',
color='key:N'
)
More information is available in the Lookup transform docs.

Related

Numpy arrays inside a pandas dataframe - how to normalize the values, keeping the original structure?

I have arrays as cells in a dataframe. The arrays are of 2 columns, a value and a category, and their lengths, meaning the amount of rows, differ.
Here's a simple example of the situation with just one column:
import pandas as pd
import numpy as np
arr1 = np.array([[1, 2,3], ['a','b','c']])
arr2 = np.array([[2, 3], ['a','b']])
df1 = pd.DataFrame(index=np.arange(0, 2), columns=(['column1']))
df1.iloc[0][0]=arr1
df1.iloc[1][0]=arr2
Resulting df1 to be
0 [[1, 2, 3], [a, b, c]]
1 [[2, 3], [a, b]]
What I want are column-widely normalized values as new columns inside arrays arr1 and arr2, so in this case using [1,2,3,2,3], not just [1,2,3] and [2,3] separately. How can I achieve this? The structure of the dataframe df1 must not change, only what is inside the cells.
Extrating the values to a list and then normalizing them is an easy task, but how to "put them back" is where I struggle because of the complex structure. Should I add an index to all the values inside the arrays to pair them up? Sound slow and unnecessary. Could I somehow create an array of reference to the original numeric objects and replace those? But if do, I would lose the original values... but how would I add them as a new column because I am only referencing the original objects?
I am sure there is an intuitive way of doing this but I just can't articulate it.

iterate 2 arrays to update rails database table

ruby on rails
i want to update the table Fruit in my database, using information stored in 2 arrays:
fruit_id=[2,8,14,35]
fruit_name=["apple","orange","banana","melon"]
so for example: Fruit.id 2 will have Fruit.name to be "apple"
i thought of for loop:
for i in fruit_id do
Fruit.find(i).update(name:fruit_name)
end
but that only made sense in my head....
i also apologize if this question had already been answered, im new to this and dont know the exact term to search for.... thanks alot!
Try something like this
fruit_id.zip(fruit_name).each do |id, name|
fruit = Fruit.find_by(id: id)
fruit.update_attribute(:name, name) if fruit
end
Here is a couple of hints I can suggest:
Use Array#zip
To reduce number of queries you can use ActiveRecord::Relation#where with ActiveRecord::Relation#update_all
So resulting code will be
fruit_ids = [2, 8, 14, 35]
fruit_names = ['apple', 'orange', 'banana', 'melon']
fruits = fruit_ids.zip(fruit_names) # => [[2, 'apple'], [8, 'orange'], [14, 'banana'], [35, 'mellon']]
fruits.each do |(id, name)|
Fruit.where(id: id).update_all(name: name)
end
Caution: update_all does not trigger validations, so if it's a case for you it's better to use initial approach with Fruit.find(id).update(name: name)

Need a database for key-array storage with array specific operations like "update union" and sub-array selection

I need a database to store pairs of key - array rows like below:
===== TABLE: shoppingCart =====
user_id - product_ids
1 - [1, 2, 3, 4]
2 - [100, 200, 300, 400]
and I want to be able to update a row with new array merging to the old one while skipping duplicate values. i.e, I want operations like:
UPDATE shoppingCart SET product_ids = UNION(product_ids, [4, 5, 6]) WHERE user_id = 1
to result the first row's product_ids column to become:
[1, 2, 3, 4, 5, 6]
I also need operations like selecting a sub-array, e.g. :
SELECT product_ids[0:2] from shoppingCart
which should result:
[1,2]
any suggestions for best database for such purposes?
the arrays I need to work with are usually long (containing about 1,000 - 10,000 values of long integers ( or string version of long integers) )

Best way to generate all combinations in array that contain certain element in it

I know that I can easily get all the combinations, but is there a way to only get the ones that contain certain element of the list? I'll give an example.
Lets say I have
arr = ['a','b','c','d']
I want to get all combinations with length (n) containing 'a', for example, if n = 3:
[a, b, c]
[a, b, d]
[a, c, d]
I want to know if there is a better way to get it without generating all combinations. Any help would be appreciated.
I would proceed as follow:
Remove 'a' from the array
Generate all combinations of 2 elements from the reduced array
For each combination, insert the 'a' in all three possible places
You can use combination of itertools and list comprehension. Like:
import itertools
import itertools
arr = ['a', 'b', 'c', 'd']
temp = itertools.combinations(arr, 3)
result = [list(i) for i in list(temp) if 'a' in i]
print(result)
output:
[['a', 'b', 'c'], ['a', 'b', 'd'], ['a', 'c', 'd']]

What is the name of this array operation?

We all know that naming things is one computer science's 2 hardest problems. Here's something for which I'm trying to find the name, if it already has one.
Let's say I have an array comprised of 2 or more equal-length arrays. This array has 4 arrays of 3 items each:
[
[1, 2, 3],
['a', 'b', 'c'],
['i', 'ii', 'iii'],
['one', 'two', 'three']
]
and I want to apply some function to get this resulting array of 3 arrays of 4 items each:
[
[1, 'a', 'i', 'one'],
[2, 'b', 'ii', 'two'],
[3, 'c', 'iii', 'three']
]
Look at the original input and imagine you're taking vertical slices across the child arrays.
Is there a language out there that can do this with a built-in function, and if so, what is the function called? Or, in general, is there a good, succinct name for this operation?
This is called transpose and is a well known operation on matrices in mathematics, see https://en.wikipedia.org/wiki/Transpose.

Resources