How do you encode a post-1994 ASN.1 EXTERNAL type? - der

Background:
Prior to 1994, EXTERNAL was defined like so (with automatic and explicit tagging):
EXTERNAL ::= [UNIVERSAL 8] IMPLICIT SEQUENCE
{
direct-reference OBJECT IDENTIFIER OPTIONAL,
indirect-reference INTEGER OPTIONAL,
data-value-descriptor ObjectDescriptor OPTIONAL,
encoding CHOICE
{single-ASN1-type [0] ANY,
octet-aligned [1] IMPLICIT OCTET STRING,
arbitrary [2] IMPLICIT BIT STRING}
}
But since then, it has been defined as:
EXTERNAL := [UNIVERSAL 8] IMPLICIT SEQUENCE {
identification CHOICE {
syntax OBJECT IDENTIFIER,
presentation-context-id INTEGER,
context-negotiation SEQUENCE {
presentation-context-id INTEGER,
transfer-syntax OBJECT IDENTIFIER } },
data-value-descriptor ObjectDescriptor OPTIONAL,
data-value OCTET STRING }
Dubuisson's ASN.1 says (page 412):
the context-specific tags, in particular, which appear before the alternatives
of the encoding component (of type CHOICE) must be encoded but not those
computed in the 1994 version.
On page 413, he describes how to encode an INSTANCE OF, which he notes is encoded identically to an EXTERNAL. The identification shows as being encoded by just a universal tag with a number of 6 (OBJECT IDENTIFIER). The encoding shows as being of the form
[CONTEXT 0]
[UNIVERSAL 2]
meaning that he is encoding an INTEGER, 5, as his choice of single-ASN-type.
My question
If post-1994 version of EXTERNAL is backwards-compatible, then data-value would have to translate to one of the pre-1994 alternatives for encoding. Which one is it?
In other words, if I encode a post-1994 EXTERNAL (using presentation-context-id as our choice of identification just for the sake of the example), does it get encoded as
[UNIVERSAL 8]
[UNIVERSAL 2] (presentation-context-id => indirect-reference)
[CONTEXT 0] (data-value => single-ASN1-type)
[UNIVERSAL 4]
or
[UNIVERSAL 8]
[UNIVERSAL 2] (presentation-context-id => indirect-reference)
[CONTEXT 1] (data-value => octet-aligned)
Thanks in advance!

Please refer to Rec. ITU-T X.690 | ISO/IEC 8825-1 clause 18. It describes in detail how the backward compatibility is maintained, indicating exactly how to map a value for the X.680 EXTERNAL associated sequence to the following SEQUENCE defined in X.690:
[UNIVERSAL 8] IMPLICIT SEQUENCE {
direct-reference OBJECT IDENTIFIER OPTIONAL,
indirect-reference INTEGER OPTIONAL,
data-value-descriptor ObjectDescriptor OPTIONAL,
encoding CHOICE {
single-ASN1-type [0] ABSTRACT-SYNTAX.&Type,
octet-aligned [1] IMPLICIT OCTET STRING,
arbitrary [2] IMPLICIT BIT STRING } }
Also, this sequence assumes an EXPLICIT TAGS environment rather than AUTOMATIC TAGS as the sequence in X.680.

Related

Matching an element in an unknown sequence location using structural pattern matching in Python 3.10

Is there any clever way to match on an element in an unknown location in a sequence of unknown length using structural pattern matching in Python 3.10?
Below is a non-working example illustrating what I'd like to do.
match [1, 2, "3", 4, 5]:
case [*before, str() as str_found, *after]:
print(f"Found string: {str_found}")
If you try to use a guard clause, the match isn't captured:
match [1, 2, "3", 4, 5]:
case [*elem] if any(isinstance(el, str) for el in elem):
print("Found string, but I can't tell you its value.")
If the length is known, an or pattern could be used, though it's not pretty:
match [1, 2, "3"]:
case [*_, str() as str_found] | [str() as str_found, *_] | [_, str() as str_found, _]:
print(f"Found string: {str_found}")
Based on answers and comments to other questions about structural pattern matching, I anticipate lots of responses informing me that structural pattern matching isn't the right tool for the job. I know my example doesn't showcase the benefit of using structural pattern matching for this as opposed to something like a simple for loop, but imagine parsing a nested dict and list structure resulting from a json.load(). In any case, my question isn't what the right tool is, but simply whether it can be done with this tool.
This is going to trigger some people, but here you go;
match [1, 2, "3", 4, 5]:
case xs if s:= next((x for x in xs if isinstance(x, str)), False):
print("Found string", repr(s))

Destructuring assignment in object creation

As with my previous question, this is an area where I can't tell if I've encountered a bug or a hole in my understanding of Raku's semantics. Last time it turned out to be a bug, but doubt lightning will strike twice!
In general, I know that I can pass named arguments to a function either with syntax that looks a lot like creating a Pair (e.g. f :a(42)) or with syntax that looks a lot like flattening a Hash (e.g., f |%h). (see argument destructuring in the docs). Typically, these two are equivalent, even for non-Scalar parameters:
sub f(:#a) { dd #a }
my %h = a => [4, 2];
f :a([4,2]); # OUTPUT: «Array element = [4, 2]»
f |%h; # OUTPUT: «Array element = [4, 2]»
However, when constructing an object with the default .new constructor, these two forms seem to give different results:
class C { has #.a; }
my %h = a => [4, 2];
C.new: :a([4,2]; # OUTPUT: «C.new(a => ([[4, 2]])»
C.new: |%h; # OUTPUT: «C.new(a => [[4, 2],])»
That is, passing :a([4,2]) results in a two-element Array, but using the argument-flattening syntax results in a one-element Array containing a two-element Array.
Is this behavior intended? If so, why? And is there syntax I can use to pass |%h in and get the two-element Array bound to an #-sigiled attribute? (I know using an $-sigiled attribute works, but I prefer the semantics of the #).
Is this behavior intended?
Yes. Parameter binding uses binding semantics, while attribute initialization uses assignment semantics. Assignment into an array respects Scalar containers, and the values of a Hash are Scalar containers.
If so, why?
The intuition is:
When calling a function, we're not going to be doing anything until it returns, so we can effectively lend the very same objects we pass to it while it executes. Thus binding is a sensible default (however, one can use is copy on a parameter to get assignment semantics).
When creating a new object, it is likely going to live well beyond the constructor call. Thus copying - that is, assignment - semantics are a sensible default.
And is there syntax I can use to pass |%h in and get the two-element Array bound to an #-sigiled attribute?
Coerce it into a Map:
class C { has #.a; }
my %h = a => [4, 2];
say C.new: |%h.Map;
Or start out with a Map in the first place:
class C { has #.a; }
my %h is Map = a => [4, 2];
say C.new: |%h;

How to get the values from nodes in tree-sitter?

If I have a simple grammar in tree-sitter:
rules: {
expr: $ => choice(
/[0-9]+/,
prec.right(seq($.expr, /[+-]/, $.expr)),
)
}
And an input:
3+4
I get the followng CST:
(start [0, 0] - [0, 3]
(expr [0, 0] - [0, 3]
(expr [0, 0] - [0, 1])
(expr [0, 2] - [0, 3])))
So my question is, how do I get the values, i.e. what was parsed, from these nodes/leafes. I somehow have to evaluate the tree. I'm certainly sure there is way, because I can also do syntax-highlighting with tree-sitter, for what I need the values (I guess). But I read the documentation and couldn't find any note, how to do it.
Tree-sitter's syntax tree doesn't store copies of the input text. So to get the text of a particular token, you would have to use the ranges that Tree-sitter gives you to compute slices of your original source code.
In the python binding, this looks like this:
source_code_bytes = b'3 + 4'
tree = parser.parse(source_code_bytes)
node1 = tree.root_node.children[0].children[0]
node1_text = source_code_bytes[node1.start_byte:node1.end_byte].decode('utf8')
assert node1_text == '3'
In some language bindings, like the wasm binding, there is a .text helper for making this easier.
There is an open issue for adding this kind of helper function to the python binding.

No implicit conversion of Struct into Array

I had an occurrence where I was working with an array of structs and wanted to transpose them, though was met with an error:
TypeError (no implicit conversion of Struct into Array)
I'd thought (incorrectly) that such implicit conversion simply called to_a on each object if available, though structs do have a to_a method.
So my question is two part:
if I can, how do I implement this implicit conversion?
secondly, why is this the case? Why can't structs be implicitly converted to arrays, yet can be explicitly converted?
Here's a minimal example to produce the error:
S = Struct.new(:a, :b)
a = S.new(1, 2)
# => #<struct S a=1, b=2>
b = S.new(3, 4)
# => #<struct S a=3, b=4>
[a, b].transpose
# TypeError (no implicit conversion of S into Array)
[a, b].map(&:to_a)
# => [[1, 2], [3, 4]]
# Therefore, the extra step I'd have to take to transpose:
[a, b].map(&:to_a).transpose
# => [[1, 3], [2, 4]]
Thanks in advance for any help.
I've actually found the answer to this while researching the question, so will pop the answer in as I couldn't find anything similar when searching earlier.
Ruby uses different coercion methods for explicit vs implicit conversion:
| Explicit | Implicit |
|----------|----------|
| to_i | to_int |
| to_s | to_str |
| to_a | to_ary |
| to_h | to_hash |
|----------|----------|
So the problem here is structs don't have a to_ary method:
a.to_ary
# NoMethodError (undefined method `to_ary' for #<struct S a=1, b=2>)
Therefore, if we define this method on the struct in question, we can implicitly convert:
S.define_method(:to_ary) do
self.to_a
end
[a, b].transpose
# => [[1, 3], [2, 4]]
Voila :)
if I can, how do I implement this implicit conversion?
You can pass a block to Struct.new and then define your method there:
S = Struct.new(:a, :b) do
alias_method :to_ary, :to_a
end
secondly, why is this the case? Why can't structs be implicitly converted to arrays, yet can be explicitly converted?
This is somewhat philosophical, but to_ary, to_int, etc. indicate that those objects really represent an array, integer, etc. respectively; they're just not exactly the right type.
It's different for the one letter variants of to_a, to_i, which indicate that the object could fall in for an array, integer, etc. but their underlying structure is quite different, or only represents a portion of their true semantics.

Difference between array<<element and array.push(element)? or string<<"something" and string+"something"? in Ruby

Before judging me for an irrelevant question, I'll safeguard myself, that I know << is a bitwise operator. However, in both cases (array, string) it operates as just adding / concatenating values.
Any tip for clarifying whether there's difference if we use array<
Thanks
However, in both cases (array, string) it operates as just adding /
concatenating values.
It makes no difference "result-wise" - in both cases you get a value containing both operands (in that or another form).
The difference shows up in the way operands are impacted:
one performs in-place mutating
second is simply concatenating without changing the original strings.
Consider the following examples:
a = 'a'
b = 'b'
Now:
# concatenation - no changes to original strings
a + b #=> "ab"
a #=> "a"
b #=> "b"
Whereas:
# mutation - original string is changed in-place
a << b #=> "ab"
a #=> "ab"
Same goes with arrays:
# concatenation - no changes to original arrays
a = ['a'] #=> ["a"]
b = ['b'] #=> ["b"]
a + b #=> ["a", "b"]
a #=> ["a"]
b #=> ["b"]
# mutation - original array is changed in-place
a << b #=> ["a", ["b"]]
a #=> ["a", ["b"]]
As to Array#push and Array#<< - they do the same thing.
Firstly, << is not a bit-wise operator. Nor is :<< or "<<". The first is not a Ruby object or keyword. :<< is a symbol and "<<" is a string. Fixnum#<<, by constrast, is a bit-wise operator, implemented as an instance method on the class Fixnum.
You may argue that it's obvious what you meant, but it's not. Many classes have instance methods of the same name that are unrelated. Several classes, for example, have methods called "<<", "+", "size", "replace", "select", "each" and on and on. The only way to speak meaningfully of an instance method, therefore, is to also give the class on which it is defined.
What is an "operator" in Ruby? Frankly, I don't know. I've never found a definition. Whatever it is, however, most of them are implemented as instance methods.
Many of Ruby's core methods have names that may seem unusual to those coming from other languages. Examples are "<<", "+" and "&". The important thing to remember is that these are perfectly-valid names. Let's try using them as you would any other method:
[1,2,3].<<(4) #=> [1, 2, 3, 4]
"cat".+("hat") #=> "cathat"
[1,2,3].&([2,4]) #=> [2]
The head Ruby monk knew that his disciples would prefer to write these as follows:
[1,2,3] << 4
"cat" + "hat"
[1,2,3] & [2,4]
so he said "OK", which when translated from Japanese to English means "OK". He simply designed the Ruby parser so that when it saw the later form it would convert the expression to the standard form before parsing it further (or something like that). This has come to be called syntactic sugar. (Syntactic sugar doesn't allow you to write "cat" concat "hat", however--it only applies to names that are made up of symbols.)
My point with Ruby's operators is that most are implement with garden-variety methods, albeit methods with odd-sounding names. Yes, there are methods String#+ and Array#+ but they are completely unrelated to each other. If they were instead named String#str_add and Array#arr_add and used like so:
"abc".str_add("def")
[1,2,3].arr_add([2,4])
you probably wouldn't be asking the question you've raised.

Resources