Learn ruby the hard way Ex39: Adding key to Array - arrays

I'm very stuck at this exercise, so I would be grateful for a very detailed answer.
Question:
At what point is a key added to the aDict array? I can only see the created index of the key being added to the array.
(return aDict [bucket-id]?)
I'm looking at this code:
module Dict
def Dict.new(num_buckets = 256)
#Initializes Dict with the given number of buckets.
aDict = []
(0...num_buckets).each do |i|
aDict.push([])
end
return aDict
end
def Dict.hash_key(aDict,key)
#given a key this will create a number
#turning it into an index for one of aDicts buckets
return key.hash % aDict.length
end
def Dict.get_bucket(aDict, key)
#Given a key, find the bucket where it would go.
bucket_id = Dict.hash_key(aDict,key)
return aDict[bucket_id]
end
def Dict.get_slot(aDict, key, default=nil)
#Returns the index, key and
#value of a slot found in a bucket.
bucket = Dict.get_bucket(aDict,key)
bucket.each_with_index do |kv, i|
k, v = kv
if key == k
return i, k, v
end
end
return -1, key, default
end
def Dict.get(aDict, key, value)
#Gets the value in a bucket for the given key or the default
i, k, v = Dict.get_slot (aDict,key, Value, default = default)
return v
end
def Dict.set(aDict,key,value)
#Sets the key to the value,
#replacing any existing value.
bucket = Dict.get_bucket(aDict, key)
i,k,v = Dict.get_slot(aDict, key)
if [i] >= 0
bucket[i] = [key,value]
else
bucket.push([key,value])
end
end
Let's say I import Dict.rb to another file and I want it to run:
require .\dict.rb
#create mapping of state to abbreviation
states Dict.new()
Dict.set( states, Oregon, OR)
When is the key (Oregon) in the bucket so that it can be returned by aDict[bucket_id]?

Ok, so first the structure of the hash table aDict will look like this with some keys in it:
[0] => [[k1, v1], [k2, v2]]
[1] => [[k3, v3]]
[2] => []
0,1,2 are the indices. At each index position, you have another array and each element of this array is a two element array containing a key and value. For example, this means that k3 is at aDict[1][0][0]
So, when you want to insert a key and value in the hash, the Dict.set method gets called
def Dict.set(aDict,key,value)
#Sets the key to the value,
#replacing any existing value.
bucket = Dict.get_bucket(aDict, key)
get_bucket calculates the first index by taking the mod of the key hash with the size of the array. It then returns the array stored at that index. (For example: bucket = aDict[1])
i,k,v = Dict.get_slot(aDict, key)
get_slot finds out which index in bucket array has your key and returns the index number along with the key and value. If it does not exist, it returns -1 for no index, the key and the default value (nil)
(For example: i, k, v will be 0, k3, v3 because [k3, v3] is at aDict[1][0]. If you were looking for k4, i, k, v would have been -1, k4, nil)
if i >= 0
bucket[i] = [key,value]
else
bucket.push([key,value])
end
end
This bit is easy. If i is not -1, then you update the location with your key and value otherwise you push new two element array at the end of the array.

Related

merge the array of array in ruby on rails

I have one array like below
[["GJ","MP"],["HR","MH"],["MP","KL"],["KL","HR"]]
And I want result like below
"GJ, MP, KL, HR, MH"
First element of array ["GJ","MP"]
Added is in the answer_string = "GJ, MP"
Now Find MP which is the last element of this array in the other where is should be first element like this ["MP","KL"]
after this I have to add KL in to the answer_string = "GJ, MP, KL"
This is What I want as output
Given
ary = [["GJ","MP"],["HR","MH"],["MP","KL"],["KL","HR"]]
(where each element is in fact an edge in a simple graph that you need to traverse) your task can be solved in a quite straightforward way:
acc = ary.first.dup
ary.size.times do
# Find an edge whose "from" value is equal to the latest "to" one
next_edge = ary.find { |a, _| a == acc.last }
acc << next_edge.last if next_edge
end
acc
#=> ["GJ", "MP", "KL", "HR", "MH"]
Bad thing here is its quadratic time (you search through the whole array on each iteration) that would hit you badly if the initial array is large enough. It would be faster to use some auxiliary data structure with the faster lookup (hash, for instance). Smth. like
head, *tail = ary
edges = tail.to_h
tail.reduce(head.dup) { |acc, (k, v)| acc << edges[acc.last] }
#=> ["GJ", "MP", "KL", "HR", "MH"]
(I'm not joining the resulting array into a string but this is kinda straightforward)
d = [["GJ","MP"],["HR","MH"],["MP","KL"],["KL","HR"]]
o = [] # List for output
c = d[0][0] # Save the current first object
loop do # Keep looping through until there are no matching pairs
o.push(c) # Push the current first object to the output
n = d.index { |a| a[0] == c } # Get the index of the first matched pair of the current `c`
break if n == nil # If there are no found index, we've essentially gotten to the end of the graph
c = d[n][1] # Update the current first object
end
puts o.join(',') # Join the results
Updated as the question was dramatically changed. Essentially, you navigating a graph.
I use arr.size.times to loop
def check arr
new_arr = arr.first #new_arr = ["GJ","MP"]
arr.delete_at(0) # remove the first of arr. arr = [["HR","MH"],["MP","KL"],["KL","HR"]]
arr.size.times do
find = arr.find {|e| e.first == new_arr.last}
new_arr << find.last if find
end
new_arr.join(',')
end
array = [["GJ","MP"],["HR","MH"],["MP","KL"],["KL","HR"]]
p check(array)
#=> "GJ,MP,KL,HR,MH"
Assumptions:
a is an Array or a Hash
a is in the form provided in the Original Post
For each element b in a b[0] is unique
First thing I would do is, if a is an Array, then convert a to Hash for faster easier lookup up (this is not technically necessary but it simplifies implementation and should increase performance)
a = [["GJ","MP"],["HR","MH"],["MP","KL"],["KL","HR"]]
a.to_h
#=> {"GJ"=>"MP", "HR"=>"MH", "MP"=>"KL", "KL"=>"HR"}
UPDATE
If the path will always be from first to end of the chain and the elements are always a complete chain, then borrowing from #KonstantinStrukov's inspiration: (If you prefer this option then please given him the credit ✔️)
a.to_h.then {|edges| edges.reduce { |acc,_| acc << edges[acc.last] }}.join(",")
#=> "GJ,MP,KL,HR,MH"
Caveat: If there are disconnected elements in the original this result will contain nil (represented as trailing commas). This could be solved with the addition of Array#compact but it will also cause unnecessary traversals for each disconnected element.
ORIGINAL
We can use a recursive method to lookup the path from a given key to the end of the path. Default key is a[0][0]
def navigate(h,from:h.keys.first)
return unless h.key?(from)
[from, *navigate(h,from:h[from]) || h[from]].join(",")
end
Explanation:
navigation(h,from:h.keys.first) - Hash to traverse and the starting point for traversal
return unless h.key?(key) if the Hash does not contain the from key return nil (end of the chain)
[from, *navigate(h,from:h[from]) || h[from]].join(",") - build a Array of from key and the recursive result of looking up the value for that from key if the recursion returns nil then append the last value. Then simply convert the Array to a String joining the elements with a comma.
Usage:
a = [["GJ","MP"],["HR","MH"],["MP","KL"],["KL","HR"]].to_h
navigate(a)
#=> "GJ,MP,KL,HR,MH"
navigate(a,from: "KL")
#=> "KL,HR,MH"
navigate(a,from: "X")
#=> nil

Ruby sort order of array of hash using another array in an efficient way so processing time is constant

I have some data that I need to export as csv. It is currently about 10,000 records and will keep growing hence I want an efficient way to do the iteration especially with regards to running several each loop, one after the other.
My question is that is there a away to avoid the many each loops i describe below and if not is there something else I can use beside Ruby's each/map to keep processing time constant irrespective of data size.
For instance:
First i will loop through the whole data to flatten and rename the fields that hold array values so that fields like issue that hol array value will be come issue_1 and issue_1 if it contains only two items in the array.
Next I will do another loop to get all the unique keys in the array of hashes.
Using the unique keys from step 2, I will do another loop to sort this unique keys using a different array that holds the order that the keys should be arranged in.
Finally another loop to generate the CSV
So I have iterated over the data 4 times using Ruby's each/map every time and the time to complete this loops will increase with data size.
Original data is in the form below :
def data
[
{"file"=> ["getty_883231284_200013331818843182490_335833.jpg"], "id" => "60706a8e-882c-45d8-ad5d-ae898b98535f", "date_uploaded" => "2019-12-24", "date_modified" => "2019-12-24", "book_title_1"=>"", "title"=> ["haha"], "edition"=> [""], "issue" => ["nov"], "creator" => ["yes", "some"], "publisher"=> ["Library"], "place_of_publication" => "London, UK"]},
{"file" => ["getty_883231284_200013331818843182490_335833.jpg"], "id" => "60706a8e-882c-45d8-ad5d-ae898b98535f", "date_uploaded" => "2019-12-24", "date_modified"=>"2019-12-24", "book_title"=> [""], "title" => ["try"], "edition"=> [""], "issue"=> ["dec", 'ten'], "creator"=> ["tako", "bell", 'big mac'], "publisher"=> ["Library"], "place_of_publication" => "NY, USA"}]
end
Remapped date by flattening arrays and renaming the keys holding those array
def csv_data
#csv_data = [
{"file_1"=>"getty_883231284_200013331818843182490_335833.jpg", "id"=>"60706a8e-882c-45d8-ad5d-ae898b98535f", "date_uploaded"=>"2019-12-24", "date_modified"=>"2019-12-24", "book_title_1"=>"", "title_1"=>"haha", "edition_1"=>"", "issue_1"=>"nov", "creator_1"=>"yes", "creator_2"=>"some", "publisher_1"=>"Library", "place_of_publication_1"=>"London, UK"},
{"file_1"=>"getty_883231284_200013331818843182490_335833.jpg", "id"=>"60706a8e-882c-45d8-ad5d-ae898b98535f", "date_uploaded"=>"2019-12-24", "date_modified"=>"2019-12-24", "book_title_1"=>"", "title_1"=>"try", "edition_1"=>"", "issue_1"=>"dec", "issue_2" => 'ten', "creator_1"=>"tako", "creator_2"=>"bell", 'creator_3' => 'big mac', "publisher_1"=>"Library", "place_of_publication_1"=>"NY, USA"}]
end
Sorting the headers for the above data
def csv_header
csv_order = ["id", "edition_1", "date_uploaded", "creator_1", "creator_2", "creator_3", "book_title_1", "publisher_1", "file_1", "place_of_publication_1", "journal_title_1", "issue_1", "issue_2", "date_modified"]
headers_object = []
sorted_header = []
all_keys = csv_data.lazy.flat_map(&:keys).force.uniq.compact
#resort using ordering by suffix eg creator_isni_1 comes before creator_isni_2
all_keys = all_keys.sort_by{ |name| [name[/\d+/].to_i, name] }
csv_order.each {|k| all_keys.select {|e| sorted_header << e if e.start_with? k} }
sorted_header.uniq
end
The generate the csv which also involves more loop:
def to_csv
data = csv_data
sorted_headers = csv_header(data)
csv = CSV.generate(headers: true) do |csv|
csv << sorted_header
csv_data.lazy.each do |hash|
csv << hash.values_at(*sorted_header)
end
end
end
To be honest, I was more intrigued to see if I am able to find out what your desired logic is without further description, than about the programming part alone (but of course i enjoyed that as well, it has been ages i did some Ruby, this was a good refresher). Since the mission is not clearly stated, it has to be "distilled" by reading your description, input data and code.
I think what you should do is to keep everything in very basic and lightweight arrays and do the heavy lifting while reading the data in one single big step.
I also made the assumption that if a key ends with a number, or if a value is an array, you want it to be returned as {key}_{n}, even if there's only one value present.
So far i came up with this code (Logic described in comments) and repl demo here
class CustomData
# #keys array structure
# 0: Key
# 1: Maximum amount of values associated
# 2: Is an array (Found a {key}_n key in feed,
# or value in feed was an array)
#
# #data: is a simple array of arrays
attr_accessor :keys, :data
CSV_ORDER = %w[
id edition date_uploaded creator book_title publisher
file place_of_publication journal_title issue date_modified
]
def initialize(feed)
#keys = CSV_ORDER.map { |key| [key, 0, false]}
#data = []
feed.each do |row|
new_row = []
# Sort keys in order to maintain the right order for {key}_{n} values
row.sort_by { |key, _| key }.each do |key, value|
is_array = false
if key =~ /_\d+$/
# If key ends with a number, extract key
# and remember it is an array for the output
key, is_array = key[/^(.*)_\d+$/, 1], true
end
if value.is_a? Array
# If value is an array, even if the key did not end with a number,
# we remember that for the output
is_array = true
else
value = [value]
end
# Find position of key if exists or nil
key_index = #keys.index { |a| a.first == key }
if key_index
# If you could have a combination of _n keys and array values
# for a key in your feed, you need to change this portion here
# to account for all previous values, which would add some complexity
#
# If current amount of values is greater than the saved one, override
#keys[key_index][1] = value.length if #keys[key_index][1] < value.length
#keys[key_index][2] = true if is_array and not #keys[key_index][2]
else
# It is a new key in #keys array
key_index = #keys.length
#keys << [key, value.length, is_array]
end
# Add value array at known key index
# (will be padded with nil if idx is greater than array size)
new_row[key_index] = value
end
#data << new_row
end
end
def to_csv_data(headers=true)
result, header, body = [], [], []
if headers
#keys.each do |key|
if key[2]
# If the key should hold multiple values, build the header string
key[1].times { |i| header << "#{key[0]}_#{i+1}" }
else
# Otherwise it is a singular value and the header goes unmodified
header << key[0]
end
end
result << header
end
#data.each do |row|
new_row = []
row.each_with_index do |value, index|
# Use the value counter from #keys to pad with nil values,
# if a value is not present
#keys[index][1].times do |count|
new_row << value[count]
end
end
body << new_row
end
result << body
end
end

How to split a hash into multiple arrays of keys based the values not exceeding a certain sum in each array?

I have a large hash, where the keys are names, like "Alex", and the values are numeric, like "100".
How can I split this hash into multiple arrays that contain the keys, of which the sum of values doesn't exceed a certain threshold value?
Example
I have the hash
{"Alex"=>50, "Bamby"=>100, "Jordan"=>300, "Ger"=>700, "Aus"=>500, "Can"=>360}
and I want to split it into packs of 1000 from the beginning (doesn't have to be from the beginning but would be nice),
meaning:
array1 = ["Alex", "Bamby", "Jordan"] # not "Ger" bc it would exceed the 1000 in sum
array2 = ["Ger"] # not the Aus because it again would exceed the 1000
array3 = ["Aus", "Can"]
The best solution would actually be to have it optimized in a way that the code makes arrays all close or equal 1000 but that's the next step I guess...
Thank you so much in advance! ~Alex
h = {"Alex"=>50, "Bamby"=>100, "Jordan"=>300, "Ger"=>700, "Aus"=>500, "Can"=>360}
tot = 0
h.keys.slice_before { |k| (tot += h[k]) > 1000 ? tot = h[k] : false }.to_a
#=> [["Alex", "Bamby", "Jordan"], ["Ger"], ["Aus", "Can"]]
Not that if tot > 1000 the block returns a truthy value (h[k]) and the parentheses around tot += h[k] are necessary.
See Enumerable#slice_before.
original = {"Alex"=>50, "Bamby"=>100, "Jordan"=>300, "Ger"=>700, "Aus"=>500, "Can"=>360}
chunked = original.inject([]) do |array, (key, value)|
array << {} unless array.any?
if array.last.values.sum + value <= 1_000
array.last.merge!(key => value)
else
array << { key => value }
end
array
end
# => [{"Alex"=>50, "Bamby"=>100, "Jordan"=>300}, {"Ger"=>700}, {"Aus"=>500, "Can"=>360}]
You can iterate over the elements inside the hash like this, the explain is in the comments:
hash={"Alex"=>50, "Bamby"=>100, "Jordan"=>300, "Ger"=>700, "Aus"=>500, "Can"=>360}
rs = [] # the outside array
rss = [] # the array inside the array
m = 0 # check if the sum of nexts are 1000
hash.each do |key, n|
if m+n <= 1000 # if the counter + the next element < 1000
m += n # then add it to the counter
rss << key # add the key to the actual array
else
rs << rss #else m is equal or bigger than 1000, so, I add all the keys to the main array
m=n # the element that overcomes m to 1000, becomes the first count now
rss=[key] # And that key is the first element of a new array
end
end
rs << rss #Importan! at the end, the final array need to be added outside the loop
print rs
Result _
=> [["Alex", "Bamby", "Jordan"], ["Ger"], ["Aus", "Can"]]

Lua in pairs with same order as it's written

Is there any way to loop trough a table like the one below in the same order as it's written?
local tbl = {
["hello"] = 1,
[2] = 2,
[50] = 3,
["bye"] = 4,
[200] = 5
}
What I mean is that when I use "in pairs" I'll get a different order everytime I execute my code ...
I'm searching for something like this:
function get_keys(tbl)
local rtable = {}
for k,v in pairs(tbl) do
table.insert(rtable, k)
end
return rtable
end
local keys_of_tbl = get_keys(tbl)
for i = 1, table.getn(keys_of_tbl) do
--Do something with: tbl[keys_of_tbl[i]]
end
But because the function "get_keys" is based on "in pairs" again, it won't work ...
In Lua, the order that pairs iterates through the keys is unspecified. However you can save the order in which items are added in an array-style table and use ipairs (which has a defined order for iterating keys in an array). To help with that you can create your own ordered table using metatables so that the key order will be maintained when new keys are added.
EDIT (earlier code inserted multiple copies of the key on updates)
To do this you can use __newindex which we be called so long as the index is not added yet to the table. The ordered_add method updates, deletes, or stores the key in the hidden tables _keys and _values. Note that __newindex will always be called when we update the key too since we didn't store the value in the table but instead stored it in the "hidden" tables _keys and _values.
Note however that we cannot use any key in this table, the key name "_keys" will overwrite our hidden table so the safer alternative is to use the ordered_table.insert(t, key, value) ordered_table.index(t, key) and ordered_table.remove(t, key) methods.
ordered_table = {}
function ordered_table.insert(t, k, v)
if not rawget(t._values, k) then -- new key
t._keys[#t._keys + 1] = k
end
if v == nil then -- delete key too.
ordered_table.remove(t, k)
else -- update/store value
t._values[k] = v
end
end
local function find(t, value)
for i,v in ipairs(t) do
if v == value then
return i
end
end
end
function ordered_table.remove(t, k)
local v = t._values[k]
if v ~= nil then
table.remove(t._keys, find(t._keys, k))
t._values[k] = nil
end
return v
end
function ordered_table.index(t, k)
return rawget(t._values, k)
end
function ordered_table.pairs(t)
local i = 0
return function()
i = i + 1
local key = t._keys[i]
if key ~= nil then
return key, t._values[key]
end
end
end
function ordered_table.new(init)
init = init or {}
local t = {_keys={}, _values={}}
local n = #init
if n % 2 ~= 0 then
error"in ordered_table initialization: key is missing value"
end
for i=1,n/2 do
local k = init[i * 2 - 1]
local v = init[i * 2]
if t._values[k] ~= nil then
error("duplicate key:"..k)
end
t._keys[#t._keys + 1] = k
t._values[k] = v
end
return setmetatable(t,
{__newindex=ordered_table.insert,
__len=function(t) return #t._keys end,
__pairs=ordered_table.pairs,
__index=t._values
})
end
--- Example Usage:
local t = ordered_table.new{
"hello", 1, -- key, value pairs
2, 2,
50, 3,
"bye", 4,
200, 5
}
print(#t)
print("hello is", t.hello)
print()
for k, v in pairs(t) do --- Lua 5.2 __pairs metamethod
print(k, v)
end
t.bye = nil -- delete that
t[2] = 7 -- use integer keys
print(#t)
No. There's no "as written in the source" order to tables. (Consider that not all keys necessarily exist in the source.) lua has no concept of "in order" for non-contiguous integer keys.
If you want a specific order you get to keep that order yourself manually in some way.
If you don't have any integer keys in your table then you can use those as your order (and use ipairs to loop those keys and index the value as the key to get the real value).
If your original values are the order you want to sort in then you can loop and reverse map to get a table that you can iterate with ipairs once done.

How can I check if a lua table contains only sequential numeric indices?

How can I write a function that determines whether it's table argument is a true array?
isArray({1, 2, 4, 8, 16}) -> true
isArray({1, "two", 3, 4, 5}) -> true
isArray({1, [3]="two", [2]=3, 4, 5}) -> true
isArray({1, dictionaryKey = "not an array", 3, 4, 5}) -> false
I can't see any way of finding out if the numeric keys are the only keys.
EDIT: Here's a new way to test for arrays that I discovered just recently. For each element returned by pairs, it simply checks that the nth item on it is not nil. As far as I know, this is the fastest and most elegant way to test for array-ness.
local function isArray(t)
local i = 0
for _ in pairs(t) do
i = i + 1
if t[i] == nil then return false end
end
return true
end
ipairs iterates over indices 1..n, where n+1 is the first integer index with a nil value
pairs iterates over all keys.
if there are more keys than there are sequential indices, then it cannot be an array.
So all you have to do is see if the number of elements in pairs(table) is equal to the number of elements in ipairs(table)
the code can be written as follows:
function isArray(tbl)
local numKeys = 0
for _, _ in pairs(tbl) do
numKeys = numKeys+1
end
local numIndices = 0
for _, _ in ipairs(tbl) do
numIndices = numIndices+1
end
return numKeys == numIndices
end
I'm pretty new to Lua, so there might be some builtin function to reduce the numKeys and numIndices calculations to simple function calls.
By "true array", I suppose you mean a table whose keys are only numbers. To do this, check the type of every key of your table. Try this :
function isArray(array)
for k, _ in pairs(array) do
if type(k) ~= "number" then
return false
end
end
return true --Found nothing but numbers !
end
Note: as #eric points out, pairs is not defined to iterate in a specific order. Hence this is no valid answer.
The following should be sufficient; it checks that the keys are sequential from 1 until the end:
local function isArray(array)
local n = 1
for k, _ in pairs(array) do
if k ~= n then return false end
n = n + 1
end
return true
end
I wrote this code for another similar question lately:
---Checks if a table is used as an array. That is: the keys start with one and are sequential numbers
-- #param t table
-- #return nil,error string if t is not a table
-- #return true/false if t is an array/isn't an array
-- NOTE: it returns true for an empty table
function isArray(t)
if type(t)~="table" then return nil,"Argument is not a table! It is: "..type(t) end
--check if all the table keys are numerical and count their number
local count=0
for k,v in pairs(t) do
if type(k)~="number" then return false else count=count+1 end
end
--all keys are numerical. now let's see if they are sequential and start with 1
for i=1,count do
--Hint: the VALUE might be "nil", in that case "not t[i]" isn't enough, that's why we check the type
if not t[i] and type(t[i])~="nil" then return false end
end
return true
end
Here's my take on this, using #array to detect a gap or stop when too many keys have been read:
function isArray(array)
local count=0
for k,_ in pairs(array) do
count=count+1
if (type(k) ~= "number" or k < 1 or k > #array or count > #array or math.floor(k) ~= k) then
return false
end
end
if count ~= #array then
return false
end
return true
end
Iterate from 0 to the number of elements, and check if all elements with the counter's index exist. If it's not an array, some indexes will miss in the sequence.

Resources