Trying to append content to numpy array - arrays

I have a script that searches Twitter for a certain term and then prints out a number of attributes for the returned results.
I'm trying to Just a blank array is returned. Any ideas why?
public_tweets = api.search("Trump")
tweets_array = np.empty((0,3))
for tweet in public_tweets:
userid = api.get_user(tweet.user.id)
username = userid.screen_name
location = tweet.user.location
tweetText = tweet.text
analysis = TextBlob(tweet.text)
polarity = analysis.sentiment.polarity
np.append(tweets_array, [[username, location, tweetText]], axis=0)
print(tweets_array)
The behavior I am trying to achieve is something like..
array = []
array.append([item1, item2, item3])
array.append([item4,item5, item6])
array is now [item1, item2, item3],[item4, item5, item6].
But in Numpy :)

np.append doesn't modify the array, you need to assign the result back:
tweets_array = np.append(tweets_array, [[username, location, tweetText]], axis=0)
Check help(np.append):
Note that
append does not occur in-place: a new array is allocated and
filled.
In the second example, you are calling list's append method which happens in place; This is different from np.append.

Here's the source code for np.append
In [178]: np.source(np.append)
In file: /usr/local/lib/python3.5/dist-packages/numpy/lib/function_base.py
def append(arr, values, axis=None):
....docs
arr = asanyarray(arr)
if axis is None:
.... special case, ravels
return concatenate((arr, values), axis=axis)
In your case arr is an array, starting with shape (0,3). values is a 3 element list. The is just a call to concatenate. So append call is just:
np.concateante([tweets_array, [[username, location, tweetText]]], axis=0)
But concatenate works with many items
alist = []
for ....:
alist.append([[username, location, tweetText]])
arr = np.concatenate(alist, axis=0)
should work just as well; better because list append is quicker. Or remove a level of nesting and let np.array stack them on a new axis, just as it does with np.array([[1,2,3],[4,5,6],[7,8,9]]):
alist = []
for ....:
alist.append([username, location, tweetText])
arr = np.array(alist) # or np.stack()
np.append has multiple problems. Wrong name. Doesn't act inplace. Hides concatenate. Flattens without much warning. Limits you to 2 inputs at a time. etc.

Related

numpy from pyspark.sql.dataframe.DataFrame convert to string array

I have a requirement to query a column in a pyspark.sql.dataframe.DataFrame. I wish to create a string array from that column. I am using numpty arrays to achieve this however the result I get is an array of arrays
import numpy as np
df = spark.read.load(parquetfiles/part-00000-e7dad738-8895-45e8-9926-39c9d677b999-c000.snappy.parquet', format='parquet')
data_array = np.asarray(df.select('name').collect())
print(type(data_array),data_array)
for x in data_array:
str = x[0]
print(type(x))
The output I get from my first print is:
<class 'numpy.ndarray'> [['London']
['New York']
['Paris']
['Rome']
['Berlin']]
And from the second Print I get
<class 'numpy.ndarray'>
So my question: is it possible to get these values as string array or failing that can I create a dynamic which I add the values of str in my for loop to as strings?
Things I've tried.
use asarray instead of array, as you can see I get the same.
data_array = list(data_array), well I get a list but its not usable as it contains all the meta too.
Open to suggestions and additional reading rather than full solutions.
Thanks.
The power of the post.
import numpy as np
df = spark.read.load('parquetfiles/part-00000-e7dad738-8895-45e8-9926-39c9d677b999-c000.snappy.parquet', format='parquet')
data_array = np.asarray(df.select('name').collect())
cases = []
for x in data_array:
str = x[0]
cases.append(str)

How to find a specific value in a nested array?

I'm trying to figure out how to place a value into one of three arrays and then shuffle those arrays and have the program output the index location of the value.
Here is what I have so far:
# The purpose of this program is to randomly place the name Zac
# in one of three arrays and return the array number and position of
# Zac
A1 = ["John","Steve","Frank","Charles"]
A2 = ["Sam","Clint","Stuart","James"]
A3 = ["Vic","Jim","Bill","David"]
n = [A1,A2,A3]
name = "Zac"
def placename(title, namelist)
mix = rand(2)
namelist[mix] << title
namelist.shuffle
return namelist
end
allnames = [] << placename(name, n)
def findname(allnames, key)
allnames.each do |i|
until allnames[i].include?(key) == true
i+=1
end
location = allnames[i].find_index(key)
puts "The location and value of #{key} is #{location}"
end
end
findname(allnames, name)
At the moment I'm getting a "undefined method for Nil Class" error (no method error)
Can someone please clarify what I'm doing wrong with this or if there is a more effective way of going about this? Thanks in advance!!
Your approach assumes that in the block starting...
allnames.each do |i|
... that i will contain the index of the allnames element. This isn't true. i will contain the VALUE (contents) of the element.
What you could try as an alternative is...
allnames.each_with_index do |_value, i|
or, you can do...
allnames.each do |value|
and then replace all references to allnames[i] with value
another problem is that...
allnames = [] << placename(name, n)
puts the returned array of arrays inside ANOTHER array. I think what you want to do is..
allnames = placename(name, n)
I modified the last fewlines. I hope this is what you wanted
allnames = placename(name, n)
def findname allnames, key
r = allnames.map.with_index{|x,i|x.include?(key) ? i : p}-[p]
puts "The location of value #{key} is array number #{r[0]} and item number #{allnames[r[0]].index(key)}"
end
findname(allnames, name)
Edit: Randomization
To get randomized array number and item number you have to do the following
def placename(title, namelist)
mix = rand(3) # Since the number of arrays (nested within) is 3 we can use 3 instead of 2
namelist[mix] << title
namelist.map!{|x|x.shuffle}.shuffle! # Shuffling each item and the whole array in place.
return namelist
end
Assuming you want to modify the array in place, I'd do it like this:
# insert name into random subarray
def insert_name name
subarray_idx = rand #name_arrays.size
subarray = #name_arrays[subarray_idx]
insertion_idx = rand subarray.size
#name_arrays[subarray_idx].insert insertion_idx, name
sprintf '"%s" inserted at #name_arrays[%d][%d]',
name, subarray_idx, insertion_idx
end
# define starting array, then print & return the
# message for further parsing if needed
#name_arrays = [
%w[John Steve Frank Charles],
%w[Sam Clint Stuart James],
%w[Vic Jim Bill David],
]
p(insert_name 'Zac')
This has a few benefits:
You can inspect #name_arrays to validate that things look the way you expect.
The message can be parsed with String#scan if desired.
You can modify #insert_name to return your indexes, rather than having to search for the name directly.
If you don't capture the insertion index as a return value, or don't want to parse it from your message String, you can search for it by leveraging Enumerable#each_with_index and Array#index. For example:
# for demonstration only, set this so you can get the same
# results since the insertion index was randomized
#name_arrays =
[["John", "Steve", "Frank", "Charles"],
["Sam", "Clint", "Stuart", "James"],
["Vic", "Jim", "Zac", "Bill", "David"]]
# return indices of nested match
def find_name_idx name
#name_arrays.each_with_index
.map { [_2, _1.index(name)] }
.reject { _1.any? nil }
.pop
end
# use Array#dig to retrieve item at nested index
#name_arrays.dig *find_name_idx('Zac')

Get the index of the last occurrence of each string in an array

I have an array that is storing a large number of various names in string format. There can be duplicates.
let myArray = ["Jim","Tristan","Robert","Lexi","Michael","Robert","Jim"]
In this case I do NOT know what values will be in the array after grabbing the data from a parse server. So the data imported will be different every time. Just a list of random names.
Assuming I don't know all the strings in the array I need to find the index of the last occurrence of each string in the array.
Example:
If this is my array....
let myArray = ["john","john","blake","robert","john","blake"]
I want the last index of each occurrence so...
blake = 5
john = 4
robert = 3
What is the best way to do this in Swift?
Normally I would just make a variable for each item possibility in the array and then increment through the array and count the items but in this case there are thousands of items in the array and they are of unknown values.
Create an array with elements and their indices:
zip(myArray, myArray.indices)
then reduce into a dictionary where keys are array elements and values are indices:
let result = zip(myArray, myArray.indices).reduce(into: [:]) { dict, tuple in
dict[tuple.0] = tuple.1
}
(myArray.enumerated() returns offsets, not indices, but it would have worked here too instead of zip since Array has an Int zero-based indices)
EDIT: Dictionary(_:uniquingKeysWith:) approach (#Jessy's answer) is a cleaner way to do it
New Dev's answer is the way to go. Except, the standard library already has a solution that does that, so use that instead.
Dictionary(
["john", "john", "blake", "robert", "john", "blake"]
.enumerated()
.map { ($0.element, $0.offset) }
) { $1 }
Or if you've already got a collection elsewhere…
Dictionary(zip(collection, collection.indices)) { $1 }
Just for fun, the one-liner, and likely the shortest, solution (brevity over clarity, or was it the other way around? :P)
myArray.enumerated().reduce(into: [:]) { $0[$1.0] = $1.1 }

How to merge 2 arrays of equal length into a single dictionary with key:value pairs in Godot?

I have been trying to randomize the values in an ordered array (ex:[0,1,2,3]) in Godot. There is supposed to be a shuffle() method for arrays, but it seems to be broken and always returns "null". I have found a workaround that uses a Fisher-Yates shuffle, but the resulting array is considered "unsorted" by the engine, and therefore when I try to use methods such as bsearch() to find a value by it's position, the results are unreliable at best.
My solution was to create a dictionary, comprised of an array containing the random values I have obtained, merged with a second array of equal length with (sorted) numbers (in numerical order) which I can then use as keys to access specific array positions when needed.
Question made simple...
In GDScript, how would you take 2 arrays..
ex: ARRAY1 = [0,1,2,3]
ARRAY2 = [a,b,c,d]
..and merge them to form a dictionary that looks like this:
MergedDictionary = {0:a, 1:b, 2:c, 3:d}
Any help would be greatly appreciated.
Godot does not support "zip" methodology for merging arrays such as Python does, so I am stuck merging them manually. However... there is little to no documentation about how to do this in GDScript, despite my many hours of searching.
Try this:
var a = [1, 2, 3]
var b = ["a", "b", "c"]
var c = {}
if a.size() == b.size():
var i = 0
for element in a:
c[element] = b[i]
i += 1
print("Dictionary c: ", c)
If you want to add elements to a dictionary, you can assign values to the keys like existing keys.

How to collapse a multi-dimensional array of hashes in Ruby?

Background:
Hey all, I am experimenting with external APIs and am trying to pull in all of the followers of a User from a site and apply some sorting.
I have refactored a lot of the code, HOWEVER, there is one part that is giving me a really tough time. I am convinced there is an easier way to implement this than what I have included and would be really grateful on any tips to do this in a much more eloquent way.
My goal is simple. I want to collapse an array of arrays of hashes (I hope that is the correct way to explain it) into one array of hashes.
Problem Description:
I have an array named f_collectionswhich has 5 elements. Each element is an array of size 200. Each sub-element of these arrays is a hash of about 10 key-value pairs. My best representation of this is as follows:
f_collections = [ collection1, collection2, ..., collection5 ]
collection1 = [ hash1, hash2, ..., hash200]
hash1 = { user_id: 1, user_name: "bob", ...}
I am trying to collapse this multi-dimensional array into one array of hashes. Since there are five collection arrays, this means the results array would have 1000 elements - all of which would be hashes.
followers = [hash1, hash2, ..., hash1000]
Code (i.e. my attempt which I do not want to keep):
I have gotten this to work with a very ugly piece of code (see below), with nested if statements, blocks, for loops, etc... This thing is a nightmare to read and I have tried my hardest to research ways to do this in a simpler way, I just cannot figure out how. I have tried flatten but it doesn't seem to work.
I am mostly just including this code to show I have tried very hard to solve this problem, and while yes I solved it, there must be a better way!
Note: I have simplified some variables to integers in the code below to make it more readable.
for n in 1..5 do
if n < 5
(0..199).each do |j|
if n == 1
nj = j
else
nj = (n - 1) * 200 + j
end
#followers[nj] = #f_collections[n-1].collection[j]
end
else
(0..199).each do |jj|
njj = (4) * 200 + jj
#followers[njj] = #f_collections[n-1].collection[jj]
end
end
end
Oh... so It is not an array objects that hold collections of hashes. Kind of. Lets give it another try:
flat = f_collection.map do |col|
col.collection
end.flatten
which can be shortened (and is more performant) to:
flat = f_collection.flat_map do |col|
col.collection
end
This works because the items in the f_collection array are objects that have a collection attribute, which in turn is an array.
So it is "array of things that have an array that contains hashes"
Old Answer follows below. I leave it here for documentation purpose. It was based on the assumption that the data structure is an array of array of hashes.
Just use #flatten (or #flatten! if you want this to be "inline")
flat = f_collections.flatten
Example
sub1 = [{a: 1}, {a: 2}]
sub2 = [{a: 3}, {a: 4}]
collection = [sub1, sub2]
flat = collection.flatten # returns a new collection
puts flat #> [{:a=>1}, {:a=>2}, {:a=>3}, {:a=>4}]
# or use the "inplace"/"destructive" version
collection.flatten! # modifies existing collection
puts collection #> [{:a=>1}, {:a=>2}, {:a=>3}, {:a=>4}]
Some recommendations for your existing code:
Do not use for n in 1..5, use Ruby-Style enumeration:
["some", "values"].each do |value|
puts value
end
Like this you do not need to hardcode the length (5) of the array (did not realize you removed the variables that specify these magic numbers). If you you want to detect the last iteration you can use each_with_index:
a = ["some", "home", "rome"]
a.each_with_index do |value, index|
if index == a.length - 1
puts "Last value is #{value}"
else
puts "Values before last: #{value}"
end
end
While #flatten will solve your problem you might want to see how DIY-solution could look like:
def flatten_recursive(collection, target = [])
collection.each do |item|
if item.is_a?(Array)
flatten_recursive(item, target)
else
target << item
end
end
target
end
Or an iterative solution (that is limited to two levels):
def flatten_iterative(collection)
target = []
collection.each do |sub|
sub.each do |item|
target << item
end
end
target
end

Resources