How do I generate a UUID in Crystal? - uuid

I'm experimenting with the Crystal standard library and was wondering how to generate a UUID? The UUID.new(..) constructors all seem to expect arguments, but how do I just generate a random one?

I was looking at the wrong part of the standard library documentation -- just a bit below the constructors is the .random class method, which requires no arguments and generates a new UUID.
Usage example:
require "uuid"
puts "New UUID: #{UUID.random}"
# Output:
# New UUID: bfc5a3cf-a138-4323-881b-764e1e798ce4

Related

Compile error when trying to use jooq any operator in kotlin

I have problems using jooq with kotlin and the any clause.
Given the following:
I have a Database field in a postgreSQL database which is an array
I have search parameters which are a List of Strings
I want to use jooq any operator to search in the array
I have the following code which is not working:
fun findAll(
someArrayListOfStrings: List<String>?
): List<SomeDTO> {
val filters = ArrayList<Condition>()
filters.add(TABLE.SOME_FIELD.eq(DSL.any(someArrayListOfStrings)))
}
Here I want to dynamically create filters (jooq Conditions) to be added to some SQL statement. It should work to look if SOME_FIELD (PostgreSQL Array Type) contains one of the following strings using the ANY clause (PostgreSQL jooq binding). However I get the following compile-time error:
None of the following functions can be called with the arguments supplied:
public abstract fun eq(p0: Array<(out) String!>!): Condition defined in org.jooq.TableField
public abstract fun eq(p0: Field<Array<(out) String!>!>!): Condition defined in org.jooq.TableField
public abstract fun eq(p0: QuantifiedSelect<out Record1<Array<(out) String!>!>!>!): Condition defined in org.jooq.TableField
public abstract fun eq(p0: Select<out Record1<Array<(out) String!>!>!>!): Condition defined in org.jooq.TableField
But my function call should match the third type where QuantifiedSelect is used.
I looked for hours on the internet but was not able to find any solution. Any site I found told me to try the solution I already have. Does anyone have an idea what I could try and why it does not work?
Thank you!
The method you're calling here is DSL.any(T...), which takes a generic varargs array (in Java). You're passing a List<String>, so this binds T = List<String>, which doesn't satisfy the type constraint on the eq() method.
But even if you changed that to an Array<String>, it wouldn't work because the jOOQ ANY operator doesn't do the exact same thing as the PostgreSQL any(array) operator. So, just resort to either plain SQL templating:
condition("{0} = any({1})", TABLE.SOME_FIELD,
DSL.value(someArrayListOfStrings.toTypedArray()))
Or just use the IN predicate
TABLE.SOME_FIELD.in(someArrayListOfStrings)

How do I get a dataframe or database write from TFX BulkInferrer?

I'm very new to TFX, but have an apparently-working ML Pipeline which is to be used via BulkInferrer. That seems to produce output exclusively in Protobuf format, but since I'm running bulk inference I want to pipe the results to a database instead. (DB output seems like it should be the default for bulk inference, since both Bulk Inference & DB access take advantage of parallelization... but Protobuf is a per-record, serialized format.)
I assume I could use something like Parquet-Avro-Protobuf to do the conversion (though that's in Java and the rest of the pipeline's in Python), or I could write something myself to consume all the protobuf messages one-by-one, convert them into JSON, deserialize the JSON into a list of dicts, and load the dict into a Pandas DataFrame, or store it as a bunch of key-value pairs which I treat like a single-use DB... but that sounds like a lot of work and pain involving parallelization and optimization for a very common use case. The top-level Protobuf message definition is Tensorflow's PredictionLog.
This must be a common use case, because TensorFlowModelAnalytics functions like this one consume Pandas DataFrames. I'd rather be able to write directly to a DB (preferably Google BigQuery), or a Parquet file (since Parquet / Spark seems to parallelize better than Pandas), and again, those seem like they should be common use cases, but I haven't found any examples. Maybe I'm using the wrong search terms?
I also looked at the PredictExtractor, since "extracting predictions" sounds close to what I want... but the official documentation appears silent on how that class is supposed to be used. I thought TFTransformOutput sounded like a promising verb, but instead it's a noun.
I'm clearly missing something fundamental here. Is there a reason no one wants to store BulkInferrer results in a database? Is there a configuration option that allows me to write the results to a DB? Maybe I want to add a ParquetIO or BigQueryIO instance to the TFX pipeline? (TFX docs say it uses Beam "under the hood" but that doesn't say much about how I should use them together.) But the syntax in those documents looks sufficiently different from my TFX code that I'm not sure if they're compatible?
Help?
(Copied from the related issue for greater visibility)
After some digging, here is an alternative approach, which assumes no knowledge of the feature_spec before-hand. Do the following:
Set the BulkInferrer to write to output_examples rather than inference_result by adding a output_example_spec to the component construction.
Add a StatisticsGen and a SchemaGen component in the main pipeline right after the BulkInferrer to generate a schema for the aforementioned output_examples
Use the artifacts from SchemaGen and BulkInferrer to read the TFRecords and do whatever is neccessary.
bulk_inferrer = BulkInferrer(
....
output_example_spec=bulk_inferrer_pb2.OutputExampleSpec(
output_columns_spec=[bulk_inferrer_pb2.OutputColumnsSpec(
predict_output=bulk_inferrer_pb2.PredictOutput(
output_columns=[bulk_inferrer_pb2.PredictOutputCol(
output_key='original_label_name',
output_column='output_label_column_name', )]))]
))
statistics = StatisticsGen(
examples=bulk_inferrer.outputs.output_examples
)
schema = SchemaGen(
statistics=statistics.outputs.output,
)
After that, one can do the following:
import tensorflow as tf
from tfx.utils import io_utils
from tensorflow_transform.tf_metadata import schema_utils
# read schema from SchemaGen
schema_path = '/path/to/schemagen/schema.pbtxt'
schema_proto = io_utils.SchemaReader().read(schema_path)
spec = schema_utils.schema_as_feature_spec(schema_proto).feature_spec
# read inferred results
data_files = ['/path/to/bulkinferrer/output_examples/examples/examples-00000-of-00001.gz']
dataset = tf.data.TFRecordDataset(data_files, compression_type='GZIP')
# parse dataset with spec
def parse(raw_record):
return tf.io.parse_example(raw_record, spec)
dataset = dataset.map(parse)
At this point, the dataset is like any other parsed dataset, so its trivial to write a CSV, or to a BigQuery table or whatever from there. It certainly helped us in ZenML with our BatchInferencePipeline.
Answering my own question here to document what we did, even though I think #Hamza Tahir's answer below is objectively better. This may provide an option for other situations where it's necessary to change the operation of an out-of-the-box TFX component. It's hacky though:
We copied and edited the file tfx/components/bulk_inferrer/executor.py, replacing this transform in the _run_model_inference() method's internal pipeline:
| 'WritePredictionLogs' >> beam.io.WriteToTFRecord(
os.path.join(inference_result.uri, _PREDICTION_LOGS_FILE_NAME),
file_name_suffix='.gz',
coder=beam.coders.ProtoCoder(prediction_log_pb2.PredictionLog)))
with this one:
| 'WritePredictionLogsBigquery' >> beam.io.WriteToBigQuery(
'our_project:namespace.TableName',
schema='SCHEMA_AUTODETECT',
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
custom_gcs_temp_location='gs://our-storage-bucket/tmp',
temp_file_format='NEWLINE_DELIMITED_JSON',
ignore_insert_ids=True,
)
(This works because when you import the BulkInferrer component, the per-node work gets farmed out to these executors running on the worker nodes, and TFX copies its own library onto those nodes. It doesn't copy everything from user-space libaries, though, which is why we couldn't just subclass BulkInferrer and import our custom version.)
We had to make sure the table at 'our_project:namespace.TableName' had a schema compatible with the model's output, but didn't have to translate that schema into JSON / AVRO.
In theory, my group would like to make a pull request with TFX built around this, but for now we're hard-coding a couple key parameters, and don't have the time to get this to a real public / production state.
I'm a little late to this party but this is some code I use for this task:
import tensorflow as tf
from tensorflow_serving.apis import prediction_log_pb2
import pandas as pd
def parse_prediction_logs(inference_filenames: List[Text]): -> pd.DataFrame
"""
Args:
inference files: tf.io.gfile.glob(Inferrer artifact uri)
Returns:
a dataframe of userids, predictions, and features
"""
def parse_log(pbuf):
# parse the protobuf
message = prediction_log_pb2.PredictionLog()
message.ParseFromString(pbuf)
# my model produces scores and classes and I extract the topK classes
predictions = [x.decode() for x in (message
.predict_log
.response
.outputs['output_2']
.string_val
)[:10]]
# here I parse the input tf.train.Example proto
inputs = tf.train.Example()
inputs.ParseFromString(message
.predict_log
.request
.inputs['input_1'].string_val[0]
)
# you can pull out individual features like this
uid = inputs.features.feature["userId"].bytes_list.value[0].decode()
feature1 = [
x.decode() for x in inputs.features.feature["feature1"].bytes_list.value
]
feature2 = [
x.decode() for x in inputs.features.feature["feature2"].bytes_list.value
]
return (uid, predictions, feature1, feature2)
return pd.DataFrame(
[parse_log(x) for x in
tf.data.TFRecordDataset(inference_filenames, compression_type="GZIP").as_numpy_iterator()
], columns = ["userId", "predictions", "feature1", "feature2"]
)

Limit Array to multiple specific data types

I am working on refactoring a tool to OOP in PS5.
I have a number of classes. Sets can contain other Sets as well as Packages. Packages can contain other Packages and Tasks. And Tasks can contain other Tasks. For example...
Set1
Package1.1
Task1.1
Set2
Package2.1
Task2.1
Set2A
Package2A
Task2A.1
Task2A.2
Package2.2
Task2.2
Set3
Package3.1
Task3.1
Task3.1A
I plan to have Set, Package and Task classes, but there are a number of different Tasks with some common features and some unique, so I will have a base Task class that is then extended by the various final task classes.
My question relates to the data structure to contain the nested objects. If each class could only contain the next deeper type everything would be easy; the variable to hold the Packages in a Set could be an array of Packages, i.e. [Package[]]$Contents.
I could make it super flexible and just do an array; [Array]$Contents, but that allows for invalid items like strings and such.
Alternatively I could have some sort of Root class, with Sets, Packages and Tasks all extended that, and final Tasks then extending Tasks, and use[Root[]]$Contents or some such. But that might not be possible and it would still allow for adding a Task to a Set, since a final Task class would ultimately be extending from Root.
So, the question becomes, can you define an array that accepts multiple possible types but is still limited, something like [Set/Package[]]$Contents? Or is there perhaps a totally different way to define a variable that limits the valid members? An Enum seems to have potential, but it seems like they are limited to strings as I tried
enum AllowedTypes {
[Array]
[Strings]
}
and that in no good.
Or am I best of just using an Array and validating what I am adding in the Add method of each Class? I can see a possible solution there where I have overloaded Add methods in the Set class, one that takes a Set, one that takes a Package, and one that takes a generic object and throws an error to log. Assuming that the more specific overload method takes priority rather than everything going to the generic method since it's technically valid. Or perhaps that generic method won't even work since the collection of overloaded Add methods technically can't collapse to one valid choice because a Set is both a [Set] and a [PSObject] I guess.
PetSerAl, as countless times before, has provided an excellent (uncommented) solution in a comment on the question, without coming back to post that solution as an answer.
Given the limits of code formatting in comments, it's worth presenting the solution in a more readable format; additionally, it has been streamlined, modularized, extended, and commented:
In short: a PowerShell custom class (PSv5+) is used to subclass standard type [System.Collections.ObjectModel.Collection[object]] in order to limit adding elements to a list of permitted types passed to the constructor.
class MyCollection : System.Collections.ObjectModel.Collection[object] {
# The types an instance of this collection
# is permitted to store instance of, initialized via the constructor.
[Type[]] $permittedTypes
# The only constructor, to which the permitted types must be passed.
MyCollection([Type[]] $permittedTypes) { $this.permittedTypes = $permittedTypes }
# Helper method to determine if a given object is of a permitted type.
[bool] IsOfPermittedType([object] $item) {
return $this.permittedTypes.Where({ $item -is $_ }, 'First')
}
# Hidden helper method for ensuring that an item about to be inserted / added
# / set is of a permissible type; throws an exception, if not.
hidden AssertIsOfPermittedType([object] $item) {
if (-not $this.IsOfPermittedType($item)) {
Throw "Type not permitted: $($item.GetType().FullName)"
}
}
# Override the base class' .InsertItem() method to add type checking.
# Since the original method is protected, we mark it as hidden.
# Note that the .Add() and .Insert() methods don't need overriding, because they
# are implemented via this method.
hidden InsertItem([int] $index, [object] $item) {
$this.AssertIsOfPermittedType($item)
([System.Collections.ObjectModel.Collection[object]] $this).InsertItem($index, $item)
}
# Override the base class' SetItem() method to add type checking.
# Since the original method is protected, we mark it as hidden.
# This method is implicitly called when indexing ([...]) is used.
hidden SetItem([int] $index, [object] $item) {
$this.AssertIsOfPermittedType($item)
([System.Collections.ObjectModel.Collection[object]] $this).SetItem($index, $item)
}
# Note: Since the *removal* methods (.Remove(), .RemoveAt())
# need to type checking, there is no need to override them.
}
With the above class defined, here's sample code that exercises it:
# Create an instance of the custom collection type, passing integers and strings
# as the only permitted types.
# Note the (...) around the type arguments, because they must be passed
# as a *single argument* that is an *array*.
# Without the inner (...) PowerShell would try to pass them as *individual arguments*.
$myColl = [MyCollection]::new(([int], [string]))
# OK, add an [int]
# .Add() implicitly calls the overridden .InsertItem() method.
$myColl.Add(1)
$myColl.Add('hi') # OK, add a [string]
# OK, override the 1st element with a different [int]
# (though a [string] would work too).
# This implicitly calls the overridden .SetItem() method.
$myColl[0] = 2
# OK - insert a [string] item at index 0
$myColl.Insert(0, 'first')
# $myColl now contains: 'first', 2, 'hi'
# Try to add an impermissible type:
$myColl.Add([long] 42)
# -> Statement-terminating error:
# 'Exception calling "Add" with "1" argument(s): "Type not permitted: System.Int64"'

How to use Collections.binarySearch() in a CodenameOne project

I am used to being able to perform a binary search of a sorted list of, say, Strings or Integers, with code along the lines of:
Vector<String> vstr = new Vector<String>();
// etc...
int index = Collections.binarySearch (vstr, "abcd");
I'm not clear on how codenameone handles standard java methods and classes, but it looks like this could be fixed easily if classes like Integer and String (or the codenameone versions of these) implemented the Comparable interface.
Edit: I now see that code along the lines of the following will do the job.
int index = Collections.binarySearch(vstr, "abcd", new Comparator<String>() {
#Override
public int compare(String object1, String object2) {
return object1.compareTo(object2);
}
});
Adding the Comparable interface (to the various primitive "wrappers") would also would also make it easier to use Collections.sort (another very useful method :-))
You can also sort with a comparator but I agree, this is one of the important enhancements we need to provide in the native VM's on the various platforms personally this is my biggest peeve in our current VM.
Can you file an RFE on that and mention it as a comment in the Number issue?
If we are doing that change might as well do both.

Printing/exporting a public key on AppEngine PyCrypto

Google AppEngine currently uses an old version of PyCrypto.
After making an RSAkey, I can't find any way to export the publickey.
Alas docs for pycrypto 2.01 currently 404. And the .export methods I see in current code don't work on PyCrypto 2.01:
Making the keypair:
rsa_key = RSA.generate(384, random_generator)
Checking methods available:
In [84]: rsa_key.publickey. <tab>
RSAkey.publickey.__call__ RSAkey.publickey.__func__ RSAkey.publickey.__reduce__ RSAkey.publickey.__str__
RSAkey.publickey.__class__ RSAkey.publickey.__get__ RSAkey.publickey.__reduce_ex__ RSAkey.publickey.__subclasshook__
RSAkey.publickey.__cmp__ RSAkey.publickey.__getattribute__ RSAkey.publickey.__repr__ RSAkey.publickey.im_class
RSAkey.publickey.__delattr__ RSAkey.publickey.__hash__ RSAkey.publickey.__self__ RSAkey.publickey.im_func
RSAkey.publickey.__doc__ RSAkey.publickey.__init__ RSAkey.publickey.__setattr__ RSAkey.publickey.im_self
RSAkey.publickey.__format__ RSAkey.publickey.__new__ RSAkey.publickey.__sizeof__
Printing doesn't work.
It should be possible to use the pickle module, provided interoperability is not that important to you.
import pickle
keyout = pickle.dumps(rsa_key)
# Save keyout into a file or a db
[ ... ]
# Retrieve keyin from the same file or db
rsa_key = pickle.loads(keyin)
Just take a look at the code to see:
def generate(bits, randfunc, progress_func=None):
"""generate(bits:int, randfunc:callable, progress_func:callable)
Generate an RSA key of length 'bits', using 'randfunc' to get
random data and 'progress_func', if present, to display
the progress of the key generation.
"""
obj=RSAobj()
# Generate random number from 0 to 7
difference=ord(randfunc(1)) & 7
# Generate the prime factors of n
if progress_func: progress_func('p\n')
obj.p=pubkey.getPrime(bits/2, randfunc)
if progress_func: progress_func('q\n')
obj.q=pubkey.getPrime((bits/2)+difference, randfunc)
obj.n=obj.p*obj.q
# Generate encryption exponent
if progress_func: progress_func('e\n')
obj.e=pubkey.getPrime(17, randfunc)
if progress_func: progress_func('d\n')
obj.d=pubkey.inverse(obj.e, (obj.p-1)*(obj.q-1))
return obj
This site has a good explanation of what each variable means.

Resources