using ruby to extract the values in a hash in a DRY way - arrays

My app passes to different methods a json_element for which the keys are different, and sometimes empty.
To handle it, I have been hard-coding the extraction with the following sample code:
def act_on_ruby_tag(json_element)
begin
# logger.progname = __method__
logger.debug json_element
code = json_element['CODE']['$'] unless json_element['CODE'].nil?
predicate = json_element['PREDICATE']['$'] unless json_element['PREDICATE'].nil?
replace = json_element['REPLACE-KEY']['$'] unless json_element['REPLACE-KEY'].nil?
hash = json_element['HASH']['$'] unless json_element['HASH'].nil?
I would like to eliminate hardcoding the values, and not quite sure how.
I started to think through it as follows:
keys = json_element.keys
keys.each do |k|
set_key = k.downcase
instance_variable_set("#" + set_key, json_element[k]['$']) unless json_element[k].nil?
end
And then use #code for example in the rest of the method.
I was going to try to turn into a method and then replace all this hardcoded code.
But I wasn't entirely sure if this is a good path.

It's almost always better to return a hash structure from a method where you have things like { code: ... } rather than setting arbitrary instance variables. If you return them in a consistent container, it's easier for callers to deal with delivering that to the right location, storing it for later, or picking out what they want and discarding the rest.
It's also a good idea to try and break up one big, clunky step with a series of smaller, lighter operations. This makes the code a lot easier to follow:
def extract(json)
json.reject do |k, v|
v.nil?
end.map do |k, v|
[ k.downcase, v['$'] ]
end.to_h
end
Then you get this:
extract(
'TEST' => { '$' => 'value' },
'CODE' => { '$' => 'code' },
'NULL' => nil
)
# => {"test"=>"value", "code"=>"code"}
If you want to persist this whole thing as an instance variable, that's a fairly typical pattern, but it will have a predictable name that's not at the mercy of whatever arbitrary JSON document you're consuming.
An alternative is to hard-code the keys in a constant like:
KEYS = %w[ CODE PREDICATE ... ]
Then use that instead, or one step further, define that in a YAML or JSON file you can read-in for configuration purposes. It really depends on how often these will change, and what sort of expectations you have about the irregularity of the input.

This is a slightly more terse way to do what your original code does.
code, predicate, replace, hash = json_element.values_at *%w{
CODE PREDICATE REPLACE-KEY HASH
}.map { |x| x.fetch("$", nil) if x }

Related

Array destructuring in Ruby

I've got a variable data which comes in one of the following two formats:
[1,2,3]
[[1,2,3],['a','b','c']]
At some point I need to parse this data and so I do:
main, alternative = data
While case (2) works as expected, (1) doesn't.
Instead it sets:
main=1
alternative=2
# 3 is dropped.
My end goal however is this:
main=[1,2,3]
alternative=nil
What's the most elegant way to do this? Ideally I'd like to avoid conditionals and long methods...
My honest answer here is don't pass data around in a fuzzy, poorly-defined structure. If at all possible, improve the underlying caller to send consistently-defined objects.
However if you're looking for a quick patch, then how about:
# data comes in one of the following two formats:
# 1. [1,2,3]
# 2. [[1,2,3],['a','b','c']]
# So, this patch enforces some consistency in the structure:
data = [data, nil] unless data.first.is_a?(Array)
main, alternative = data
If you are lucky enough to be running on ruby 2.7 or 3, you can use pattern matching:
case data
in [Array => main, Array => alternative]
# here `main` and `alternative` are bound to the expected items
# because the match succeeds by type.
in main
# now main is bound but alternative might still be bound to the previous
# clause, so don't use it.
alternative = nil
end
A more fluent, but still correct, way would be
data in [Array => main, Array => alternative] or data in Array => main
# now main and alternative are as expected
If the structure (length) of the array is known beforehand to be 3, you might also be comfortable with
data in [[_,_,_] => main, [_,_,_] => alternative] or data in [_,_,_] => main
so you have less false negatives.

Storing values obtained from for each loop Scala

Scala beginner who is trying to store values obtains in a Scala foreach loop but failing miserably.
The basic foreach loop looks like this currently:
order.orderList.foreach((x: OrderRef) => {
val references = x.ref}))
When run this foreach loop will execute twice and return a reference each time. I'm trying to capture the reference value it returns on each run (so two references in either a list or array form so I can access these values later)
I'm really confused about how to go about doing this...
I attempted to retrieve and store the values as an array but when ran, the array list doesn't seem to hold any values.
This was my attempt:
val newArray = Array(order.orderList.foreach((x: OrderRef) => {
val references = x.ref
}))
println(newArray)
Any advice would be much appreciated. If there is a better way to achieve this, please share. Thanks
Use map instead of foreach
order.orderList.map((x: OrderRef) => {x.ref}))
Also val references = x.ref doesn't return anything. It create new local variable and assign value to it.
Agree with answer 1, and I believe the reason is below:
Return of 'foreach' or 'for' should be 'Unit', and 'map' is an with an changed type result like example below:
def map[B](f: (A) ⇒ B): Array[B]
Compare To for and foreach, the prototype should be like this
def foreach(f: (A) ⇒ Unit): Unit
So If you wanna to get an changed data which is maped from your source data, considering more about functions like map, flatMap, and these functions will traverse all datas like for and foreach(except with yield), but with return values.

Check if each string from array is contained by another string array

Sorry I'am new on Ruby (just a Java programmer), I have two string arrays:
Array with file paths.
Array with patterns (can be a path or a file)
I need to check each patter over each "file path". I do with this way:
#flag = false
["aa/bb/cc/file1.txt","aa/bb/cc/file2.txt","aa/bb/dd/file3.txt"].each do |source|
["bb/cc/","zz/xx/ee"].each do |to_check|
if source.include?(to_check)
#flag = true
end
end
end
puts #flag
This code is ok, prints "true" because "bb/cc" is in source.
I have seen several posts but can not find a better way. I'm sure there should be functions that allow me to do this in fewer lines.
Is this is possible?
As mentioned by #dodecaphonic use Enumerable#any?. Something like this:
paths.any? { |s| patterns.any? { |p| s[p] } }
where paths and patterns are arrays as defined by the OP.
While that will work, that's going to have geometric scaling problems, that is it has to do N*M tests for a list of N files versus M patterns. You can optimize this a little:
files = ["aa/bb/cc/file1.txt","aa/bb/cc/file2.txt","aa/bb/dd/file3.txt"]
# Create a pattern that matches all desired substrings
pattern = Regexp.union(["bb/cc/","zz/xx/ee"])
# Test until one of them hits, returns true if any matches, false otherwise
files.any? do |file|
file.match(pattern)
end
You can wrap that up in a method if you want. Keep in mind that if the pattern list doesn't change you might want to create that once and keep it around instead of constantly re-generating it.

puppet hiera array, loop and hash

I have currently an issue between hiera/puppet:
In my hiera I have:
mysql_user_mgmt:
- mysql_user: 'toto#localhost'
mysql_hash_password: '*94BDCEBE19083CE2A1F959FD02F964C7AF4CFC29'
mysql_grant_user: 'toto#localhost/*.*'
mysql_user_table_privileges: '*.*'
- mysql_user: 'test#localhost'
mysql_hash_password: '*94BDCEBE19083CE2A1F959FD02F964C7AF4CFC29'
mysql_grant_user: 'test#localhost/*.*'
mysql_user_table_privileges: '*.*'
In my puppet, I'm trying to make a loop to get data from hiera:
$mysql_user_mgmt = hiera('mysql_user_mgmt',undef)
define mysql_loop () {
$mysql_hash_password = $name['mysql_hash_password']
notify { "mysql_hash_password: ${mysql_hash_password}": }
}
mysql_loop { $mysql_user_mgmt: }
but I'm getting some weird errors. Can someone help me to figure out how to make the loop?
Resource titles are strings. Always.
You are trying to use the the title of a mysql_loop resource to feed a hash to the type definition. That does not work. A stringified version of the hash will end up being used instead, and your later attempts to retrieve components by hash index will fail, likely with some kind of type error.
You have a few options:
You could restructure your definition and data a bit, and pass the aggregate data as a hash parameter. (Example below.)
You could restructure your definition and data a bit, and use the create_resources() function.
If you've moved up to Puppet 4, or if you are willing to enable the future parser in Puppet 3, then you could use one of the new(ish) looping functions such as each().
Example of alternative (1):
Reorganize the data to a hash of hashes, keyed on the user id:
mysql_user_mgmt:
'toto#localhost':
mysql_hash_password: '*94BDCEBE19083CE2A1F959FD02F964C7AF4CFC29'
mysql_grant_user: 'toto#localhost/*.*'
mysql_user_table_privileges: '*.*'
'test#localhost':
mysql_hash_password: '*94BDCEBE19083CE2A1F959FD02F964C7AF4CFC29'
mysql_grant_user: 'test#localhost/*.*'
mysql_user_table_privileges: '*.*'
Modify the definition:
define mysql_user ($all_user_info) {
$mysql_hash_password = $all_user_info[$title]['mysql_hash_password']
notify { "mysql_hash_password: ${mysql_hash_password}": }
}
Use it like so:
$mysql_user_mgmt = hiera('mysql_user_mgmt',undef)
$mysql_user_ids = keys($mysql_user_mgmt)
mysql_user { $mysql_user_ids: all_user_info => $mysql_user_mgmt }
(The keys() function is available from the puppetlabs-stdlib module.)

Should I choose a hash, an object or an array to represent a data instance in Perl?

I was always wondering about this, but never really looked thoroughly into it.
The situation is like this: I have a relatively large set of data instances. Each instance has the same set or properties, e.g:
# a child instance
name
age
height
weight
hair_color
favorite_color
list_of_hobbies
Usually I would represent a child as a hash and keep all children together in a hash of hashes (or an array of hashes).
What always bothered me with this approach is that I don't really use the fact that all children (inner hashes) have the same structure. It seems like it might be wasteful memory-wise if the data is really large, so if every inner hash is stored from scratch it seems that the names of the key names can take far more sapce than the data itself...
Also note that when I build such data structures I often nstore them to disk.
I wonder if creating a child object makes more sense in that perspective, even though I don't really need OO. Will it be more compact? Will it be faster to query?
Or perhaps representing each child as an array makes sense? e.g.:
my ($name, $age, $height, $weight, $hair_color, $favorite_color, $list_of_hobbies) = 0..7;
my $children_h = {
James => ["James", 12, 1.62, 73, "dark brown", "blue", ["playing football", "eating ice-cream"]],
Norah => [...],
Billy => [...]
};
print "James height is $children_h->{James}[$height]\n";
Recall my main concerns are space efficiency (RAM or disk when stored), time efficiency (i.e. loading a stored data-set then getting the value of property x from instance y) and ... convenience (code readability etc.).
Thanks!
Perl is smart enough to share keys among hashes. If you have 100,000 hashes with the same five keys, perl stores those five strings once, and references to them a hundred thousand times. Worrying about the space efficiency is not worth your time.
Hash-based objects are the most common kind and the easiest to work with, so you should use them unless you have a damn good reason not to.
You should save yourself a lot of trouble, start using Moose, and stop worrying about the internals of your objects (although, just between you and me, Moose objects are hash-based unless you use special extensions to make them otherwise -- and once again, you shouldn't do that without a really good reason.)
I guess it is mainly personal taste (except of course when other people have to work on your code too)
Anyway, I think you should look into moose It is definitely not the most time nor space efficient, but it is the most pleasant and most secure way of working.
(By secure, I mean that other people that use your object can't misuse it as easily)
I personally prefer an object when I'm really representing something.
And when I work with objects in perl, I prefer moose
Gr,
ldx
Unless absolute speed tuning is a requirement, I would make an object using Moose. For pure speed, use constant indexes and an array.
I like objects because they reduce the mental effort needed to work with big deep structures. For example, if you build a data structure to represent the various classrooms in a school. You'll have something like a list of kids, a teacher and a room number. If you have everything in a big structure you have to know the structure internals access the hobbies of the children in the classroom. With objects, you can do somthing like:
my #all_hobbies = uniq map $_->all_hobbies,
map $_->all_students, $school->all_classrooms;
I don't care about the internals. And I can concisely generate a unique list of all the kids hobbies. All the complicated accesses are still happening, but I don't need to worry about what is happening. I can simply use the interface.
Here's a Moose version of your child class. I set up the hobbies attribute to use the array trait, so we get a bunch of methods simply for the asking.
package Child;
use Moose;
has [ 'name', 'hair_color', 'fav_color' ] => (
is => 'ro',
isa => 'Str',
required => 1,
);
has [ 'age', 'height', 'weight' ] => (
is => 'ro',
isa => 'Num',
required => 1,
);
has hobbies => (
is => 'ro',
isa => 'Int',
default => sub {[]},
traits => ['Array'],
handles => {
has_no_hobbies => 'is_empty',
num_hobbies => 'count',
has_hobbies => 'count',
add_hobby => 'push',
clear_hobbies => 'clear',
all_hobbies => 'elements',
},
);
# Good to do these, see moose best practices manual.
__PACKAGE__->meta->make_immutable;
no Moose;
Now to use the Child class:
use List::MoreUtils qw( zip );
# Bit of messing about to make array based child data into objects;
#attributes = qw( name age height weight hair_color fav_color hobbies );
my #children = map Child->new( %$_ ),
map { zip #attributes, #$_ },
["James", 12, 1.62, 73, "dark brown", "blue", ["playing football", "eating ice-cream"]],
["Norah", 13, 1.75, 81, "black", "red", ["computer programming"]],
["Billy", 11, 1.31, 63, "red", "green", ["reading", "drawing"]],
;
# Now index by name:
my %children_by_name = map { $_->name, $_ } #children;
# Here we get kids with hobbies and print them.
for my $c ( grep $_->has_hobbies, #children ) {
my $n = $c->name;
my $h = join ", ", $c->all_hobbies;
print "$n likes $h\n";
}
I usually start with a hash and manipulate that, until I find instances where the data I really want is derived from the data that I have. And/or that I want some sort of peculiar--or even polymorphic--behavior.
At that point, I start creating a packages to store class behavior, implementing methods as needed.
Another case is where I think this data would be useful in more than one instance. In that case, it's either rewrite all the selection cases everywhere where you think you'll need it or package the behavior in a class, so that you don't have to do too much copying or studying of the cases the next time you want to use that data.
Generally, if you don't need utter efficiency, hashes will be your best bet. In Perl an object is just a $something with a class name attached. The object can be a hash, an array, a scalar, a code reference, or even a glob reference inside. So objects can only possibly be a win in convenience, not efficiency.
If you want to give an array a shot, the typical way of making that somewhat maintainable is using constants for the field names:
use strict;
use warnings;
use constant {
NAME => 0,
AGE => 1,
HEIGHT => 2,
WEIGHT => 3,
HAIR_COLOR => 4,
FAVORITE_COLOR => 5,
LIST_OF_HOBBIES => 6,
};
my $struct = ["James", 12, 1.62, 73, "dark brown", "blue", ["playing football", "eating ice-cream"]];
# And then access it with the constants as index:
print $struct->[NAME], "\n";
$struct->[AGE]++; # happy birthday!
Alternatively, you could try whether using an array (object) as follows makes more sense:
package MyStruct;
use strict;
use warnings;
use Class::XSAccessor::Array
accessors => {
name => 0,
age => 1,
height => 2,
weight => 3,
hair_color => 4,
favorite_color => 5,
list_of_hobbies => 6,
};
sub new {
my $class = shift;
return bless([#_] => $class);
}
package main;
my $s = MyStruct->new;
$s->name("James");
$s->age(12);
$s->height(1.62);
$s->weight(73);
# ... you get the drill, but take care: The following is fine:
$s->list_of_hobbies(["foo", "bar"]);
# This can produce action-at-a-distance:
my $hobbies = ["foo", "bar"];
$s->list_of_hobbies($hobbies);
$hobbies->[1] = "baz"; # $s changed, too (due to reference)
Coming back to my original point: Usually, you want hashes or hash-based objects.
Whenever I try to decide between using a hash or an array to store data, I almost always use a hash. I can almost always find a useful way to index the values in the list for quick lookup. However, your question is more about hashes of array refs vs hashes of hash refs vs hashes of object refs.
In your example above, I would have used a hash of hash refs rather than a hash of array refs. The only time I would use an array is when there is an inherent order in the data that should be maintained, that way I can look things up in order. In this case, there isn't really any inherent order in the arrays you're storing (e.g., you arbitrarily chose height before weight), so it would be more appropriate (in my humble opinion) to store the data as a hash where the keys are descriptions of the data you're storing (name, height, weight, etc).
As to whether you should use a hash of hash refs or a hash of object refs, that can often be a matter of preference. There is some overhead associated with object-oriented Perl, so I try only to use it when I can get a large benefit in, say, usability. I usually only use objects/classes when there are actions inherently associated with the data (so I can write $my_obj->fix(); rather than fix($my_obj);). If you're just storing data, I would say stick with a hash.
There should not be a significant difference in RAM usage or in time to read from/write to disk. In terms of readability, I think you will get a huge benefit using hashes over arrays, since with the hashes the keys actually make sense, but the arrays are just indexed by numbers that have no real relationship with the data. This may require more disk space for storage if you're storing in plain text, but if that's a huge concern you can always compress the data!

Resources