Is there YAML syntax for sharing part of a list or map? - arrays

So, I know I can do something like this:
sitelist: &sites
- www.foo.com
- www.bar.com
anotherlist: *sites
And have sitelist and anotherlist both contain www.foo.com and www.bar.com. However, what I really want is for anotherlist to also contain www.baz.com, without having to repeat www.foo.com and www.baz.com.
Doing this gives me a syntax error in the YAML parser:
sitelist: &sites
- www.foo.com
- www.bar.com
anotherlist: *sites
- www.baz.com
Just using anchors and aliases it doesn't seem possible to do what I want without adding another level of substructure, such as:
sitelist: &sites
- www.foo.com
- www.bar.com
anotherlist:
- *sites
- www.baz.com
Which means the consumer of this YAML file has to be aware of it.
Is there a pure YAML way of doing something like this? Or will I have to use some post-YAML processing, such as implementing variable substitution or auto-lifting of certain kinds of substructure? I'm already doing that kind of post-processing to handle a couple of other use-cases, so I'm not totally averse to it. But my YAML files are going to be written by humans, not machine generated, so I would like to minimise the number of rules that need to be memorised by my users on top of standard YAML syntax.
I'd also like to be able to do the analogous thing with maps:
namedsites: &sites
Foo: www.foo.com
Bar: www.bar.com
moresites: *sites
Baz: www.baz.com
I've had a search through the YAML spec, and couldn't find anything, so I suspect the answer is just "no you can't do this". But if anyone has any ideas that would be great.
EDIT: Since there have been no answers, I'm presuming that no one has spotted anything I haven't in the YAML spec and that this can't be done at the YAML layer. So I'm opening up the question to idea for post-processing the YAML to help with this, in case anyone finds this question in future.

The merge key type is probably what you want. It uses a special << mapping key to indicate merges, allowing an alias to a mapping (or a sequence of such aliases) to be used as an initializer to merge into a single mapping. Additionally, you can still explicitly override values, or add more that weren't present in the merge list.
It's important to note that it works with mappings, not sequences as your first example. This makes sense when you think about it, and your example looks like it probably doesn't need to be sequential anyway. Simply changing your sequence values to mapping keys should do the trick, as in the following (untested) example:
sitelist: &sites
? www.foo.com # "www.foo.com" is the key, the value is null
? www.bar.com
anotherlist:
<< : *sites # merge *sites into this mapping
? www.baz.com # add extra stuff
Some things to notice. Firstly, since << is a key, it can only be specified once per node. Secondly, when using a sequence as the value, the order is significant. This doesn't matter in the example here, since there aren't associated values, but it's worth being aware.

As the previous answers have pointed out, there is no built-in support for extending lists in YAML. I am offering yet another way to implement it yourself. Consider this:
defaults: &defaults
sites:
- www.foo.com
- www.bar.com
setup1:
<<: *defaults
sites+:
- www.baz.com
This will be processed into:
defaults:
sites:
- www.foo.com
- www.bar.com
setup1:
sites:
- www.foo.com
- www.bar.com
- www.baz.com
The idea is to merge the contents of a key ending with a '+' to the corresponding key without a '+'. I implemented this in Python and published here.

(Answering my own question in case the solution I'm using is useful for anyone who searches for this in future)
With no pure-YAML way to do this, I'm going to implement this as a "syntax transformation" sitting between the YAML parser and the code that actually uses the configuration file. So my core application doesn't have to worry at all about any human-friendly redundancy-avoidance measures, and can just act directly on the resulting structures.
The structure I'm going to use looks like this:
foo:
MERGE:
- - a
- b
- c
- - 1
- 2
- 3
Which would be transformed to the equivalent of:
foo:
- a
- b
- c
- 1
- 2
- 3
Or, with maps:
foo:
MERGE:
- fork: a
spoon: b
knife: c
- cup: 1
mug: 2
glass: 3
Would be transformed to:
foo:
fork: a
spoon: b
knife: c
cup: 1
mug: 2
glass: 3
More formally, after calling the YAML parser to get native objects from a config file, but before passing the objects to the rest of the application, my application will walk the object graph looking for mappings containing the single key MERGE. The value associated with MERGE must be either a list of lists, or a list of maps; any other substructure is an error.
In the list-of-lists case, the entire map containing MERGE will be replaced by the child lists concatenated together in the order they appeared.
In the list-of-maps case, the entire map containing MERGE will be replaced by a single map containing all of the key/value pairs in the child maps. Where there is overlap in the keys, the value from the child map occurring last in the MERGE list will be used.
The examples given above are not that useful, since you could have just written the structure you wanted directly. It's more likely to appear as:
foo:
MERGE:
- *salt
- *pepper
Allowing you to create a list or map containing everything in nodes salt and pepper being used elsewhere.
(I keep giving that foo: outer map to show that MERGE must be the only key in its mapping, which means that MERGE cannot appear as a top-level name unless there are no other top level names)

To clarify something from the two answers here, this is not supported directly in YAML for lists (but it is supported for dictionaries, see kittemon's answer).

To piggyback off of Kittemon's answer, note that you can create mappings with null values using the alternative syntax
foo:
<< : myanchor
bar:
baz:
instead of the suggested syntax
foo:
<< : myanchor
? bar
? baz
Like Kittemon's suggestion, this will allow you to use references to anchors within the mapping and avoid the sequence issue. I found myself needing to do this after discovering that the Symfony Yaml component v2.4.4 doesn't recorgnize the ? bar syntax.

Related

ReactJS - Is it possible to render this dictionary (via map) when the keys can change?

Question Is it possible to render a dictionary with a key (which isn't known until an algorithm is run) with a value that is an array, itself with a dictionary with unknown key-value pairs until an algorithm is run?
Detailed information
I have this dictionary:
var currentWorkers = = {EmployeesAtRestaurant :
[{"James" : "Manager"},
{"Jessica" : "Waiter"},
{"Bob" : "Waiter"},
{"Ben" : "Chef"}],
EmployeesAtOffice :
[{"Rebecca" : "Manager"},
{"Nicole" : "Part-time Employee"},
{"Robert" : "Full-time Employee"},
{"Eric" : "Full-time Employee"}],
EmployeesAtZoo :
[{"Robert" : "Manager"},
{"Naomi" : "Part-time Employee"},
{"Jennifer" : "Full-time Employee"},
{"Ken" : "Full-time Employee"}]}
And I want to render it on a page as below (mock up). It is to display employees of an organisation:
What I've tried
I've read some previous answers on Stack (Push component to array of components - ReactJS) of how to attempt this, but their dictionaries use a simple key and value pair, and since my key is not known (i.e I can't do dictionary.Organisation for example) I'm not able to do the above.
I've tried to remodel the dictionary into a model similar to the above, but then I lose a lot of the information above.
Frankly, I'm beginning to suspect my best option is to just remodel the dictionary at this point, if the above is too difficult to attempt.
To make sure I'm understanding your question: Are you talking about the special prop called key[1] that React uses for rendering?
If that's the case, the important thing is that each key is unique, but it doesn't necessarily have to be the same key that your algorithms are calculating.
If you don't have access to the results of your algorithm yet, but you still want to render the strings like in your screenshots, you'll need a different unique key to use while mapping.
The map function on Arrays sends the element as the first function parameter, and the element's index as the second parameter[2]. Lots of places will warn you that index keys aren't the best. As far as I know, this is because if the order of the Array shifts then you lose the optimization that React is trying to provide you.
If index is the only unique data you've got, however, it's reasonable to consider using it. This is especially true if the data is coming from a static source, because you know that the order of the data isn't going to shift out from under you.
Let me know if I've misunderstood your question.
https://reactjs.org/docs/lists-and-keys.html#keys
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/map

How to handle nested array with >> and return a flat array?

I wonder weather there is a concise way in Raku to process a nested array (array of arrays) and
flatten the result? When transforming flat arrays >>. is handy, but if I want to return arrays and the result should be flat, what construct is avaiblable in Raku?
grammar g {
rule port { <pnamelist> + %% ","}
token pnamelist { <id> + }
token id { \w }
}
class a {
method port ($/) { make $<pnamelist>>>.made }
method pnamelist ($/) { make $<id>>>.made }
method id ($/ ) { make $/.Str }
}
my $r = g.parse("ab,cd", :rule('port'), :actions(a) ).made;
say $r; # [[a b] [c d]]
The above snippet outputs [[a b] [c d]], however what I actually want is [a b c d] . Is there a concise way to rewrite make $<pnamelist>>>.made so that it iterates over array $<pnamelist> and gathers each elements .made in a flat list what is then input for ``make` ?
TL;DR Flatten using positional subscripting. For your example, append [*;*] or [**] to $<pnamelist>>>.made. You'll get the same result as Liz's solution.
Why use this solution?
Liz is right that map and relatives (for, deepmap, duckmap, nodemap, and tree) are more flexible, at least collectively, and combining them with .Slip can be just the ticket.
But I often I find it cognitively simpler to use these tools, and others, including hypers, to create whatever data structure, without worrying about .Sliping, and then just flatten the result at the end by just appending [*;*...] or [**] as explained below.
Last but not least, it's simple and succinct:
method port ($/) { make $<pnamelist>>>.made[**] }
^^^^
Flattening N levels deep with [*;*...]
The general approach for flattening N levels deep works today as #Larry always intended.
The first Whatever strips away the outer array; each additional Whatever, separated by a ;, peels away another inner level of nesting. For your current example two Whatevers does the job:
method port ($/) { make $<pnamelist>>>.made[*;*] }
^^^^^
This yields the same result as Liz's solution:
(a b c d)
If you want the end result to have an outer array, just add it to the end result wherever you think appropriate, eg:
method port ($/) { make [ $<pnamelist>>>.made[**] ] }
Bulldozing with [**]
If you want to bulldoze a nested array/list, peeling away all nesting no matter how deep, you could just write more *;s than you can possibly need. Any extras won't make any difference.
But the desire to bulldoze is natural enough, and comes up often enough, that it makes sense to have an operation that does it without needing a hacky notion like "just write lots of *;".
So it should come as no surprise that #Larry specified such a bulldozing operation a decade or so ago. It's nicely consistent with the rest of Raku's feel, using a HyperWhatever (**) as an indexing value.
But trying it:
method port ($/) { make $<pnamelist>>>.made[**] }
^^^^
currently yields:
HyperWhatever in array index not yet implemented. Sorry.
Fortunately one can very easily "fake" it:
sub postfix:< [**] > (\lhs) { gather lhs.deepmap: *.take }
The body of the postfix comes from here.
With this in place then changing the [*;*] to [**] will work for your example but will continue to work no matter how deeply nested its left hand side is.
And presuming HyperWhatever in array index is one day implemented as a built in, one will be able to remove the postfix definition and any code using it will work without it -- and, presumably, get a speedup.
make $<pnamelist>.map(*.made.Slip)
When you slip a list of values in a map, they get flattened.
Using >>. is nice in many cases, but personally I prefer .map as it allows for more flexibility.

How do I use Array#dig and Hash#dig introduced in Ruby 2.3?

Ruby 2.3 introduces a new method on Array and Hash called dig. The examples I've seen in blog posts about the new release are contrived and convoluted:
# Hash#dig
user = {
user: {
address: {
street1: '123 Main street'
}
}
}
user.dig(:user, :address, :street1) # => '123 Main street'
# Array#dig
results = [[[1, 2, 3]]]
results.dig(0, 0, 0) # => 1
I'm not using triple-nested flat arrays. What's a realistic example of how this would be useful?
UPDATE
It turns out these methods solve one of the most commonly-asked Ruby questions. The questions below have something like 20 duplicates, all of which are solved by using dig:
How to avoid NoMethodError for missing elements in nested hashes, without repeated nil checks?
Ruby Style: How to check whether a nested hash element exists
In our case, NoMethodErrors due to nil references are by far the most common errors we see in our production environments.
The new Hash#dig allows you to omit nil checks when accessing nested elements. Since hashes are best used for when the structure of the data is unknown, or volatile, having official support for this makes a lot of sense.
Let's take your example. The following:
user.dig(:user, :address, :street1)
Is not equivalent to:
user[:user][:address][:street1]
In the case where user[:user] or user[:user][:address] is nil, this will result in a runtime error.
Rather, it is equivalent to the following, which is the current idiom:
user[:user] && user[:user][:address] && user[:user][:address][:street1]
Note how it is trivial to pass a list of symbols that was created elsewhere into Hash#dig, whereas it is not very straightforward to recreate the latter construct from such a list. Hash#dig allows you to easily do dynamic access without having to worry about nil references.
Clearly Hash#dig is also a lot shorter.
One important point to take note of is that Hash#dig itself returns nil if any of the keys turn out to be, which can lead to the same class of errors one step down the line, so it can be a good idea to provide a sensible default. (This way of providing an object which always responds to the methods expected is called the Null Object Pattern.)
Again, in your example, an empty string or something like "N/A", depending on what makes sense:
user.dig(:user, :address, :street1) || ""
One way would be in conjunction with the splat operator reading from some unknown document model.
some_json = JSON.parse( '{"people": {"me": 6, ... } ...}' )
# => "{"people" => {"me" => 6, ... }, ... }
a_bunch_of_args = response.data[:query]
# => ["people", "me"]
some_json.dig(*a_bunch_of_args)
# => 6
It's useful for working your way through deeply nested Hashes/Arrays, which might be what you'd get back from an API call, for instance.
In theory it saves a ton of code that would otherwise check at each level whether another level exists, without which you risk constant errors. In practise you still may need a lot of this code as dig will still create errors in some cases (e.g. if anything in the chain is a non-keyed object.)
It is for this reason that your question is actually really valid - dig hasn't seen the usage we might expect. This is commented on here for instance: Why nobody speaks about dig.
To make dig avoid these errors, try the KeyDial gem, which I wrote to wrap around dig and force it to return nil/default if any error crops up.

Gremlin / Bulbflow: How to select nodes based on their edges and related vertice's properties

Sorry for the long post, but I want to avoid any misunderstanding about what I'm looking for :)
I am currently discovering graph databases, and experimenting a bit with bulbflow/neo4j.
Thus, I am trying to use gremlin for most of my requests, but I do not know if the request I want is even feasible or not. I may even be wrong about trying to use a graph db for such a use case, so don't mind telling me whether you think I'm on the right path or not.
First, let me provide a bit of context:
I work on an early-stage open-source project, which is a compiler for a DSL language generating C code. We are currently planning to re-write the whole thing in python for many many reasons (the language, re-designing, opening to a community and such...). The compiler includes what I'll call a cache of the compiled interfaces and templates. The interfaces describe the templates, and each template is associated to a configuration (a list of typed values associated to variables described by the interfaces).
The aim of the request I'm wishing to build is to select a single template implementation depending on an input configuration (actually used in the generation mechanism of the compiler). In the end, I want to be able to request directly through gremlin (if possible at all) a single element I'm looking for in order to provide unicity for the elements that can be found within this "cache". Currently, I manually match this configuration in the python code, but I want to know if it is feasible to do it directly within gremlin.
-
So let's define a sample graph for my use-case:
We have three types of vertices:
Def (Definition), contains a String property called "signature", which is actually the signature of the template defined by this node.
Impl (Implementation), containing two properties which are pathes to the original source and pre-compiled files.
Var (variable), containing a String property which is the signature of the variable.
Then, a few kind of edges:
Def -> impl_by -> Impl (multiple implementations can exist for a definition, does not contain any property)
Impl -> select_by -> Var (Implementations may be selected through a constraint over a configuration variable's value, each edge of this type contains actually three properties: type, value, and constraint - a comparison operator -)
The selected_by edge (or relationship, following bulflow's vocabulary) describes a selection constraint, and thus has the following properties:
val (value associated to the variable for the origin implementation)
op (comparison operator telling which kind of comparison to make for the constraint to be valid or not)
This translates as a graph such as (I'll omit the types from the selected_by edges in this graph):
-- select_by { value="John", op="="} ---------
| \
(1)--Impl--- select_by { value=12, op=">"} ------ \
| \ \
| \ |- Var("name")
| |- select_by { value="Peter", op="="} -----------/
Def (2)--Impl-- \/
| |- select_by { value=15, op="<"} ---- /\
| \ / \
| |-/----|--- Var("ver")
(3)--Impl--- select_by { value="Kat", op="!="} ------/ /
| /
|--- select_by { value=9, op=">"} ---------/
What I want to do is to select one (or more) Impl depending on their relationship with the Vars. Let's say I have a configuration as follows:
Config 1:
variable="name", value="Peter"
variable="ver", value=16
This would select Impl(3) Since Peter != Kat AND 16 > 9, but not Impl(1) since Peter != John nor Impl(2) since 16 !< 15.
I was blocked on multiple levels, so I was starting to wonder if this was even feasible:
I could not find how to give such arguments (the configuration) to a gremlin script
I could not find how to select the Impl based on conditions over the outgoing edges.
I hope this wasn't too confusing.
Cheers, and thanks !
EDIT:
I managed to make part of my request work, by using repeatedly backtracking and filters. The request (X being the starting vertex, VALUE the value I want to match, and NAME the name of the variable to be matched) looks like this:
Basis of the request:
g.v(X).out('impl').as('implem')
Repeat this part for each couple VALUE/NAME:
.out('select_by').filter{it.value=='VAL‌​UE'}
.inV('select_by').filter{it.name=='NAME'}
.back('implem')
The only thing currently missing is that I do not know how to use the select_by edge's 'op' property to determine how to build the filter to use. For instance, thre are cases where I want to match exactly the configuration (and thus, as in this request, I ignore the 'op' property), but there are cases where I want to take the 'op' property into account, and use the associated comparator in the filters.
Is there any way to do that ? (Or should I post another question?)

C: Regular expression optimization | Saving state

Lets suppose i'd like to search through "AAAAAAAABBBBBBB" and "AAAAAAAACCCCCCCCCCC".
I search for a pattern "(AB|AC)".
Is there a way to saving the state of the search after searching the "AAAAAAAA" part and then continuing with the [B..] and [C..] part seperatly? so i need to search in [A..] only once.
i write a short pseudo code example to be more clear.
step one:
pattern = "(AB|AC)"
match("AAAAAAAA", pattern)
save_state()
step two:
match("BBBBBBB", pattern)
should the find match "AB"
step three:
restore_state()
match("CCCCCCCCCCC", pattern)
should find match "AC"
If you use a regex flavor that uses a (real) NFA/DFA approach (like RE2), you don't have to store something since every input character is (or should be) used only once (this could not get better).
If your flavor uses a backtracking algorithm, you might have luck. Some of these engines allow you to introduce a non-backtracking (aka possessive) part by using (?>x) or x{1}+ (where {1} could be any quantifier)
So in your case, it could be (if it's allowed)
(?>A)(B|C)
or
A{1}+(B|C)

Resources