What is the name of this array operation? - arrays

We all know that naming things is one computer science's 2 hardest problems. Here's something for which I'm trying to find the name, if it already has one.
Let's say I have an array comprised of 2 or more equal-length arrays. This array has 4 arrays of 3 items each:
[
[1, 2, 3],
['a', 'b', 'c'],
['i', 'ii', 'iii'],
['one', 'two', 'three']
]
and I want to apply some function to get this resulting array of 3 arrays of 4 items each:
[
[1, 'a', 'i', 'one'],
[2, 'b', 'ii', 'two'],
[3, 'c', 'iii', 'three']
]
Look at the original input and imagine you're taking vertical slices across the child arrays.
Is there a language out there that can do this with a built-in function, and if so, what is the function called? Or, in general, is there a good, succinct name for this operation?

This is called transpose and is a well known operation on matrices in mathematics, see https://en.wikipedia.org/wiki/Transpose.

Related

Sorting array items by frequency value

is there a good way or algorithm to sort an array by the value of a frequency in each item? Let's say I have this array:
[3, 3, 3, 5, 6, 12, 5, 5, 6]. I want the output to be [3, 5, 6, 12]. I was thinking about something like insertionsort but I belive that there could be an easier way.
Well, you definetly need the count for each element, which is O(n). Then you can make a unique list from it (let's say it has m elements), and sort it according to the frequency with any sorting algo, you like. It will be O(n+mlog(m)).
For example in python:
from collections import Counter
myList = ['a', 'b', 'b', 'a', 'b', 'c']
myCounter = Counter(myList)
myUniqueList = list(myCounter)
sorted(myUniqueList, key=lambda e: myCounter[e])
Editted according to Paddy3118 's comment
First you count them
std::unordered_map<int, int> dict;
for(auto &val : array)
dict[val]++; // O(N)
Then you sort them
std::vector vec(dict.begin(), dict.end());
std::sort(vec.begin(), vec.end(), [](auto a, auto b) { return a.second > b.second }); // O(M lg M), requires C++14 to use auto

Applying transform_lookup on datasets with different number of rows

I am currently learning Altair's maps feature and while looking into one of the examples (https://altair-viz.github.io/gallery/airport_connections.html), I noticed that the datasets (airports.csv and flights-airport.csv) have different number of rows. Is it possible to apply transform_lookup even if that's the case?
Yes, it is possible to apply transform_lookup to datasets with different numbers of rows. The lookup transform amounts to a one-sided join based on a specified key colum: regardless of how many rows each dataset has, for each row of the main dataset, the first match in the lookup data is joined to the data.
A simple example to demonstrate this:
import altair as alt
import pandas as pd
df1 = pd.DataFrame({
'key': ['A', 'B', 'C'],
'x': [1, 2, 3]
})
df2 = pd.DataFrame({
'key': ['A', 'B', 'C', 'D'],
'y': [1, 2, 3, 4]
})
alt.Chart(df1).transform_lookup(
lookup='key',
from_=alt.LookupData(df2, key='key', fields=['y'])
).mark_bar().encode(
x='x:Q',
y='y:O',
color='key:N'
)
More information is available in the Lookup transform docs.

Best way to generate all combinations in array that contain certain element in it

I know that I can easily get all the combinations, but is there a way to only get the ones that contain certain element of the list? I'll give an example.
Lets say I have
arr = ['a','b','c','d']
I want to get all combinations with length (n) containing 'a', for example, if n = 3:
[a, b, c]
[a, b, d]
[a, c, d]
I want to know if there is a better way to get it without generating all combinations. Any help would be appreciated.
I would proceed as follow:
Remove 'a' from the array
Generate all combinations of 2 elements from the reduced array
For each combination, insert the 'a' in all three possible places
You can use combination of itertools and list comprehension. Like:
import itertools
import itertools
arr = ['a', 'b', 'c', 'd']
temp = itertools.combinations(arr, 3)
result = [list(i) for i in list(temp) if 'a' in i]
print(result)
output:
[['a', 'b', 'c'], ['a', 'b', 'd'], ['a', 'c', 'd']]

Infinitely expanding array

How would I go about creating an array of arrays, that can continue that way, adding arrays inside arrays etc, without explicitly knowing how many arrays can contain arrays?
On top of this, out of curiosity, is it possible to change type in place with Arrays, for example if I create an array with ["test"] can I subsequently change it to [["test"]] and so on?
Any comprehensive tutorials on how arrays can be nested etc would be amazing, but currently it's still very difficult to search for crystal topics.
You can use recursive aliases for this (see language reference for alias):
alias NestedArray = Array(NestedArray) | <YourArrayItemType(s)>
An example (carc.in):
alias NestedArray = Array(NestedArray) | Int32
array = [] of NestedArray
array << 1
array << [2, 3, 4, [5, [6, 7, [8] of NestedArray] of NestedArray] of NestedArray] of NestedArray
array << Array(NestedArray){Array(NestedArray){10, 11}}
array # => [1, [2, 3, 4, [5, [6, 7, [8]]]], [[10, 11]]]
Concerning the second question, I am not sure what you mean. You can change the type of a variable like this:
array = ["test"]
array = [array]
array # => [["test"]]

Remove one object from an array with multiple matching objects

I have an array:
array = ['a', 'b', 'c', 'a', 'b', 'a', 'a']
sorted, just to make it easier to look at:
array = ['a', 'a', 'a', 'a', 'b', 'b', 'c']
I want to remove, for example, three of the a's. array.delete('a') removes every a.
The following code 'works' but I think you'll agree it's absolutely hideous.
new_array = array.sort.join.sub!('aaa', '').split(//)
How do I do this more cleanly?
To give a bit more information on what I'm doing here, I have some strings being pushed into an array asynchronously. Those strings can be (and often are) identical to each other. If there are a certain number of those matching strings, an action is triggered, the matching object is removed (like Tetris, I guess), and the process continues.
Before the following code is run, array could be ['a', 'a', 'a', 'b']:
while array.count(product[:name]) >= product[:quantity]
# trigger an event
product[:quantity].times do
array.slice!(array.index(product[:name]))
end
end
assuming that product[:name] is a and product[:quantity] is 3, after the code above runs, array should be ['b'].
I think you have an XY-problem. Instead of an array, you should use a hash with number of occurrences as the value.
hash = Hash.new(0)
When you want to add an entity, you should do:
hash["a"] += 1
If you want to limit the number to a certain value, say k, then do:
hash["a"] += 1 unless hash["a"] == k
slice may be the thing you're looking for:
3.times {array.slice!(array.index('a'))}
If you want to maintain, or convert, an array so it only one instance of each element, you can use uniq or a Set instead of an array.
array = ['a', 'b', 'c', 'a', 'b', 'a', 'a']
array.uniq # => ["a", "b", "c"]
require 'set'
array.to_set # => #<Set: {"a", "b", "c"}>
A Set will automatically maintain the uniqueness of all the elements for you, which is useful if you're going to have a huge number of potentially repetitive elements and don't want to accumulate them in memory before doing a uniq on them.
#sawa mentioned that this looks like an "XY problem", and I agree.
The source of the problem is using an array instead of a hash as your basic container. An array is good when you have a queue or list of things to process in order but it's horrible when you need to keep track of the count of things, because you have to walk that array to find out how many of a certain thing you have. There are ways to coerce the information you want out of an array when you get the array as your source.
Since it looks like he identified the real problem, here are some building blocks to use around the problem.
If you have an array and want to figure out how many different elements there are, and their count:
array = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c']
array_count = array.group_by { |i| i }.map{ |k, v| [k, v.size] }.to_h
# => {"a"=>4, "b"=>2, "c"=>2}
From that point it's easy to find out which ones exceed a certain count:
array_count.select{ |k, v| v >= 3 } # => {"a"=>4}
For a quick way to remove all elements of something from the array, after processing you can use a set "difference" operation:
array = ['a', 'a', 'a', 'a', 'b', 'b', 'c']
array -= ['a']
# => ["b", "b", "c", "c"]
or delete_if:
array.delete_if { |i| i == 'a' }
array # => ["b", "b", "c"]

Resources