Ruby: Extract elements from deeply nested JSON structure based on criteria - arrays

Want to extract every marketID from every market that has a marketName == 'Moneyline'. Tried a few combinations of .maps, .rejects, and/or .selects but can't narrow it down as the complicated structure is confusing me.
There are many markets in events, and there are many events as well. A sample of the structure (tried to edit it for brevity):
{"currencyCode"=>"GBP",
"eventTypes"=>[
{"eventTypeId"=>6423,
"eventNodes"=>[
{"eventId"=>28017227,
"event"=>
{"eventName"=>"Philadelphia # Seattle"
},
"marketNodes"=>[
{"marketId"=>"1.128274650",
"description"=>
{"marketName"=>"Moneyline"}
},
{"marketId"=>"1.128274625",
"description"=>
{"marketName"=>"Winning Margin"}
}}}]},
{"eventId"=>28018251,
"event"=>
{"eventName"=>"Arkansas # Mississippi State"
},
"marketNodes"=>[
{"marketId"=>"1.128299882",
"description"=>
{"marketName"=>"Under/Over 60.5pts"}
},
{"marketId"=>"1.128299881",
"description"=>
{"marketName"=>"Moneyline"}
}}}]},
{"eventId"=> etc....
Tried all kinds of things, for example,
markets = json["eventTypes"].first["eventNodes"].map {|e| e["marketNodes"].map { |e| e["marketId"] } if (e["marketNodes"].map {|e| e["marketName"] == 'Moneyline'})}
markets.flatten
# => yields every marketId not every marketId with marketName of 'Moneyline'
Getting a simple array with every marketId from Moneyline markets with no other information is sufficient. Using Rails methods is fine too if preferred.
Sorry if my editing messed up the syntax. Here's the source. It looks like this only with => instead of : after parsing the JSON.
Thank you!

I love nested maps and selects :D
require 'json'
hash = JSON.parse(File.read('data.json'))
moneyline_market_ids = hash["eventTypes"].map{|type|
type["eventNodes"].map{|node|
node["marketNodes"].select{|market|
market["description"]["marketName"] == 'Moneyline'
}.map{|market| market["marketId"]}
}
}.flatten
puts moneyline_market_ids.join(', ')
#=> 1.128255531, 1.128272164, 1.128255516, 1.128272159, 1.128278718, 1.128272176, 1.128272174, 1.128272169, 1.128272148, 1.128272146, 1.128255464, 1.128255448, 1.128272157, 1.128272155, 1.128255499, 1.128272153, 1.128255484, 1.128272150, 1.128255748, 1.128272185, 1.128278720, 1.128272183, 1.128272178, 1.128255729, 1.128360712, 1.128255371, 1.128255433, 1.128255418, 1.128255403, 1.128255387

Just for fun, here's another possible answer, this time with regexen. It is shorter but might break depending on your input data. It reads the json data directly as String :
json = File.read('data.json')
market_ids = json.scan(/(?<="marketId":")[\d\.]+/)
market_names = json.scan(/(?<="marketName":")[^"]+/)
moneyline_market_ids = market_ids.zip(market_names).select{|id,name| name=="Moneyline"}.map{|id,_| id}
puts moneyline_market_ids.join(', ')
#=> 1.128255531, 1.128272164, 1.128255516, 1.128272159, 1.128278718, 1.128272176, 1.128272174, 1.128272169, 1.128272148, 1.128272146, 1.128255464, 1.128255448, 1.128272157, 1.128272155, 1.128255499, 1.128272153, 1.128255484, 1.128272150, 1.128255748, 1.128272185, 1.128278720, 1.128272183, 1.128272178, 1.128255729, 1.128360712, 1.128255371, 1.128255433, 1.128255418, 1.128255403, 1.128255387
It outputs the same result as the other answer.

Related

How could I do the sum of all values of a nested hash?

I have a nested hash like this
Aranea={
"Aranéomorphes"=>{
"Agelenidae"=>[80,1327],
"Amaurobiidae"=>[49,270],
"Ammoxenidae"=>[4,18],
"Anapidae"=>[58,233],
"Anyphaenidae"=>[56,572],
"Araneidae"=>[175,3074],
"Archaeidae"=>[5,90],
"Arkydiae"=>[2,38],
"Austrochilidae"=>[3,10],
"Caponiidae"=>[18,119],
"Cheiracanthiidae"=>[12,353],
"Cithaeronidae"=>[2,8],
"Clubionidae"=>[16,639],
"Corinnidae"=>[68,489],
"Ctenidae"=>[48,519],......
For each key (spiders families), the array represents [number of genders, number of species].
Iwould like to get the sum of all first elements....i.e all the genders in total....
I tried different things without success like :
genre = []
#total = genre.transpose.map {|x| x.reduce(:+)}
Or....
def sum_deeply(h)
h.values.inject(0) { |m, v|
m + (Hash === v[0] ? sum_deeply(v[0]) : v[0].to_i)
}
end
puts sum_deeply(Aranea)
But none does work for with transpose I get a no implicit conversion error...
Could anyone enligthen me on this ? Thanks
!!! Update.... 08.07.2020... solution found with
families = Aranea
num_genders = families.flat_map do |_family_name, species_hash|
num_genders, _num_species = species_hash.values.transpose
num_genders
Thanks to Kache for his help on this.
This should do what you want:
families = Aranea
num_genders = families.flat_map do |_family_name, species_hash|
num_genders, _num_species = species_hash.values.transpose
num_genders
end
num_genders.inject(:+)
Just a tip: splitting out the "data extraction" and "data processing" (i.e. accessing the num_genders value vs summing them) will make your code easier to follow.
I don't think there'll be any part of the above that you won't understand, but if there is, just let me know what parts you'd like to have explained.

Ruby Array of Hash parsing

I have a yaml file in the format:
parameters:
- param_name: age
requires:
- name
- param_name: height
requires:
- name
Based on this format I would like to accept a hash of keys and values and determine if the combination of keys and values is valid. For example based on the above example if someone submitted a hash with the values:
{'age' => 15, 'height' => '6ft'}
it would be considered invalid since the parameter name is required. So a valid submission would look like
{'age' => 15, 'height' => '6ft', 'name' => 'Abe Lincoln'}.
Essentially what I want is this:
For each parameter object, if it has a requires array underneath it. Check all parameter param_names for elements in that array, if any are missing exit.
I have a very ugly double loop that checks for this but I want to tighten the code up. I think I can use blocks in order to validate the data I need. Here is what I have come up with so far:
require 'yaml'
requirements = YAML.load_file('./require.yaml')
require_fields = Array.new
requirements['parameters'].each do |param|
require_fields.concat(param['require']) if param.has_key? 'require'
end
require_fields.each do |requirement|
found = false
requirements['parameters'].each do |param|
if param['param_name'] == requirement
found = true
end
end
abort "#{requirement} is a required field" unless found
end
You can clean this up a lot if you make it more idiomatic Ruby:
require 'yaml'
requirements = YAML.load_file('./require.yaml')
require_fields = requirements['parameters'].select do |param|
param.has_key?('require')
end.map do |param|
param['require']
end
require_fields.each do |requirement|
found = requirements['parameters'].any? do |param|
param['param_name'] == requirement
end
abort "#{requirement} is a required field" unless found
end
You could also do this:
require_fields = requirements['parameters'].map do |param|
param['require']
end.compact
Where that's probably good enough so long as your require is either something or nil.
You could also transform that input YAML into a simple hash structure of dependencies:
dependencies = requirements.map do ||
[ param['param_name'], param['requires'] ]
end.to_h
Then you can test really easily:
dependencies.each do |name, requirements|
found = requirements.find do |required_name|
!dependencies[required_name]
end
abort "#{found} is a required field" unless found
end
This is a really rough adaptation of your code, but I hope it gives you some ideas.
I would go with subsequent checks, collecting errors and reporting all at once:
req = YAML.load 'parameters:
- param_name: age
requires:
- name
- param_name: height
requires:
- name'
input = {'age' => 15, 'height' => '6ft'}
req['parameters'].each_with_object([]) do |req, err|
next unless input[req['param_name']] # nothing to check
missed = req['requires'].reject { |param| input[param] }
errors = missed.map do |param|
[req['param_name'], param].join(' requires ')
end
err.concat(errors)
end
#⇒ ["age requires name", "height requires name"]
Or, chaining:
req['parameters'].each_with_object(Hash.new { |h, k| h[k] = [] }) do |req, err|
next unless input[req['param_name']] # nothing to check
req['requires'].each do |param|
err[param] << req['param_name'] unless input[param]
end
end.map do |missing, required|
"Missing #{missing} parameter, required for: [#{required.join(', ')}]"
end.join(',')
#⇒ "Missing name parameter, required for: [age, height]"

using lookup tables to plot a ggplot and table

I'm creating a shiny app and i'm letting the user choose what data that should be displayed in a plot and a table. This choice is done through 3 different input variables that contain 14, 4 and two choices respectivly.
ui <- dashboardPage(
dashboardHeader(),
dashboardSidebar(
selectInput(inputId = "DataSource", label = "Data source", choices =
c("Restoration plots", "all semi natural grasslands")),
selectInput(inputId = "Variabel", label = "Variable", choices =
choicesVariables)),
#choicesVariables definition is omitted here, because it's very long but it
#contains 14 string values
selectInput(inputId = "Factor", label = "Factor", choices = c("Company
type", "Region and type of application", "Approved or not approved
applications", "Age group" ))
),
dashboardBody(
plotOutput("thePlot"),
tableOutput("theTable")
))
This adds up to 73 choices (yes, i know the math doesn't add up there, but some choices are invalid). I would like to do this using a lookup table so a created one with every valid combination of choices like this:
rad1<-c(rep("Company type",20), rep("Region and type of application",20),
rep("Approved or not approved applications", 13), rep("Age group", 20))
rad2<-choicesVariable[c(1:14,1,4,5,9,10,11, 1:14,1,4,5,9,10,11, 1:7,9:14,
1:14,1,4,5,9,10,11)]
rad3<-c(rep("Restoration plots",14),rep("all semi natural grasslands",6),
rep("Restoration plots",14), rep("all semi natural grasslands",6),
rep("Restoration plots",27), rep("all semi natural grasslands",6))
rad4<-1:73
letaLista<-data.frame(rad1,rad2,rad3, rad4)
colnames(letaLista) <- c("Factor", "Variabel", "rest_alla", "id")
Now its easy to use subset to only get the choice that the user made. But how do i use this information to plot the plot and table without using a 73 line long ifelse statment?
I tried to create some sort of multidimensional array that could hold all the tables (and one for the plots) but i couldn't make it work. My experience with these kind of arrays is limited and this might be a simple issue, but any hints would be helpful!
My dataset that is the foundation for the plots and table consists of dataframe with 23 variables, factors and numerical. The plots and tabels are then created using the following code for all 73 combinations
s_A1 <- summarySE(Samlad_info, measurevar="Dist_brukcentrum",
groupvars="Companytype")
s_A1 <- s_A1[2:6,]
p_A1=ggplot(s_A1, aes(x=Companytype,
y=Dist_brukcentrum))+geom_bar(position=position_dodge(), stat="identity") +
geom_errorbar(aes(ymin=Dist_brukcentrum-se,
ymax=Dist_brukcentrum+se),width=.2,position=position_dodge(.9))+
scale_y_continuous(name = "") + scale_x_discrete(name = "")
where summarySE is the following function, burrowed from cookbook for R
summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=TRUE,
conf.interval=.95, .drop=TRUE) {
# New version of length which can handle NA's: if na.rm==T, don't count them
length2 <- function (x, na.rm=FALSE) {
if (na.rm) sum(!is.na(x))
else length(x)
}
# This does the summary. For each group's data frame, return a vector with
# N, mean, and sd
datac <- ddply(data, groupvars, .drop=.drop,
.fun = function(xx, col) {
c(N = length2(xx[[col]], na.rm=na.rm),
mean = mean (xx[[col]], na.rm=na.rm),
sd = sd (xx[[col]], na.rm=na.rm)
)
},
measurevar
)
# Rename the "mean" column
datac <- rename(datac, c("mean" = measurevar))
datac$se <- datac$sd / sqrt(datac$N) # Calculate standard error of the mean
# Confidence interval multiplier for standard error
# Calculate t-statistic for confidence interval:
# e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
ciMult <- qt(conf.interval/2 + .5, datac$N-1)
datac$ci <- datac$se * ciMult
return(datac)
}
The code in it's entirety is a bit to large but i hope this may clarify what i'm trying to do.
Well, thanks to florian's comment i think i might have found a solution my self. I'll present it here but leave the question open as there is probably far neater ways of doing it.
I rigged up the plots (that was created as lists by ggplot) into a list
plotList <- list(p_A1, p_A2, p_A3...)
tableList <- list(s_A1, s_A2, s_A3...)
I then used subset on my lookup table to get the matching id of the list to select the right plot and table.
output$thePlot <-renderPlot({
plotValue<-subset(letaLista, letaLista$Factor==input$Factor &
letaLista$Variabel== input$Variabel & letaLista$rest_alla==input$DataSource)
plotList[as.integer(plotValue[1,4])]
})
output$theTable <-renderTable({
plotValue<-subset(letaLista, letaLista$Factor==input$Factor &
letaLista$Variabel== input$Variabel & letaLista$rest_alla==input$DataSource)
skriva <- tableList[as.integer(plotValue[4])]
print(skriva)
})

How to build an array comprised of two others using only particular elements of each?

I writing a little program to generate some bogus top-ten sales numbers for book sales. I'm trying to do this in as compact a fashion as possible and do it without using MySQL or another DB.
I have written out what I want to happen. I've created a bogus catalog array and a bogus sales array corresponding sales to the index of the catalog entries. That part all works great.
I want to create a third array that includes all the titles from the catalog array with the sales numbers from the sales array, like a join in a DB, but without any DB. I can't figure out how to do that part of it though. I think once I have it in there I can sort it the way I want it, but making that third array is killing. I cannot figure out what I'm doing wrong or how to do it right.
So given the following code:
require 'random_word'
class BestOnline
def initialize
#catalog = Array.new
#sales = Array.new
#topten = Array.new
inventory = rand(50) + 10
days = rand(1..50)
now = Time.now
yesterday = now - 86400
saleshistory = now - (days * 86400)
(1..inventory).each do
#catalog << {
:title => "#{RandomWord.adjs.next.capitalize} #{RandomWord.nouns.next.capitalize}",
:price => rand(5.99..29.99).round(2)}
end
(0..days).each do
#sales << {
:id => rand(0..#catalog.count),
:salescount => rand(0..24),
:date => rand(saleshistory..now) }
end
end
def bestsellers
#sales.each do
# THIS DOESNT WORK AND I'M STUCK AS HOW TO FIX IT.
# #topten << {
# :title => #catalog[:id],
# :salescount => #sales[:salescount]
# }
end
puts #topten.group_by{ |tt| tt[:salescount]}.sort_by{ |k,v| -k}.first(10)
end
end
BestOnline.new.bestsellers
How can I create a third array that contains the titles and number of sales and output the result of the top-ten books sold?
Try this out:
def bestsellers
#sales.each do |sale|
#topten << {
title: #catalog[sale[:id]][:title],
salescount: sale[:salescount] }
end
#topten.sort! { |x, y| y[:salescount] <=> x[:salescount] }
puts #topten.first(10)
end
I suggest you write:
def bestsellers(sales)
sales.max_by(10) { |h| h[:salescount][:salescount]] }
end
puts bestsellers(sales)
Enumerable#max_by was permitted to have an argument in Ruby v2.2.
There are several problems with the way you've structured your code. Now that you have running code (by incorporating #fbonds66's answer), I suggest you post it at SO's sister-site Code Review. The purpose of CR is to suggest improvements to working code. If you read through some of the questions and answers there I think you will be impressed.
I was doing the dereferencing wrong trying to build the 3rd array of the 1st two:
#sales.each do |sale|
#topten << {
:title => #catalog[sale[:id]][:title],
:salescount => sale[:salescount]
}
end
I needed to work on the hash returned from .each as |sale| and use correct syntax to get what I was after from the other arrays.

Create array from given array

Is it possible to create an Array from another Array?
Lang: Ruby on Rails
Case
Workers are entitled to fill in their own work hours. Sometimes they forget to do it. This is what I want to tackle. In the end, I want an Array with time codes of periods the worker forgot to register his hours.
timecodes = [201201, 201202, 201203, 201204, 201205, 201206, 201207, 201208, 201209, 201210, 201211, 201212, 201213, 201301, 201302, 201304, 201305, 201306, ...]
Worker works from 201203 to 201209 with us.
timecards = [201203, 201204, 201205, 201207, 201208, 201209]
As you see, he forgot to register 201206.
What I want to do
# Create Array from timecode on start to timecode on end
worked_with_us = [201203, 201204, 201205, 201206, 201207, 201208, 201209]
#=> This is the actual problem, how can I automate this?
forgot_to_register = worked_with_us.?????(timecards)
forgot_to_register = worked_with_us - timecards # Thanks Zwippie
#=> [201206]
Now I know which period the worker forgot to register his hours.
All together
How can I create an Array from another Array, giving a start and end value?
You can just subtract arrays with - (minus):
[1, 2, 3] - [1, 3] = [2]
To build an array with years/months, this can be done with a Range, but this only works if you build an array for each year, something like:
months = (2012..2013).map do |year|
("#{year}01".."#{year}12").to_a.collect(&:to_i)
end.flatten
=> [201201, 201202, 201203, 201204, 201205, 201206, 201207, 201208, 201209, 201210, 201211, 201212, 201301, 201302, 201303, 201304, 201305, 201306, 201307, 201308, 201309, 201310, 201311, 201312]
And for the function to create those ranges dynamically:
def month_array(year_from, year_to, month_from=1, month_to=12)
(year_from..year_to).map do |year|
# Correct from/to months
mf = year_from == year ? month_from : 1
mt = year_to == year ? month_to : 12
(mf..mt).map do |month|
("%d%02d" % [year, month]).to_i
end
end.flatten
end
Update: You wanted other input parameters for this method, but I hope you can work that out yourself. :)

Resources