Apache Camel - Yaml DSL - Expression returning custom datatype - apache-camel

I am using Camel kamelets SFTP connect to process files in SFTP. When SFTP Source downloads the file from SFTP server we need to set the file length in the header.
I have used set-header to set the values of file length and it is working except the data type, we are expecting the value of the header to be LONG but the simple expression returns the STRING data type. How can I return the LONG datatype from simple expression (or any other expressions),
Is YAML DSL supports result type in simple expression?

You can use result-type
However, then you need to use the verbose syntax to be able to set multiple options on the simple, something like:
- set-header:
name: test
simple:
expression: "${body}"
result-type: "long"

Related

How do I get a dataframe or database write from TFX BulkInferrer?

I'm very new to TFX, but have an apparently-working ML Pipeline which is to be used via BulkInferrer. That seems to produce output exclusively in Protobuf format, but since I'm running bulk inference I want to pipe the results to a database instead. (DB output seems like it should be the default for bulk inference, since both Bulk Inference & DB access take advantage of parallelization... but Protobuf is a per-record, serialized format.)
I assume I could use something like Parquet-Avro-Protobuf to do the conversion (though that's in Java and the rest of the pipeline's in Python), or I could write something myself to consume all the protobuf messages one-by-one, convert them into JSON, deserialize the JSON into a list of dicts, and load the dict into a Pandas DataFrame, or store it as a bunch of key-value pairs which I treat like a single-use DB... but that sounds like a lot of work and pain involving parallelization and optimization for a very common use case. The top-level Protobuf message definition is Tensorflow's PredictionLog.
This must be a common use case, because TensorFlowModelAnalytics functions like this one consume Pandas DataFrames. I'd rather be able to write directly to a DB (preferably Google BigQuery), or a Parquet file (since Parquet / Spark seems to parallelize better than Pandas), and again, those seem like they should be common use cases, but I haven't found any examples. Maybe I'm using the wrong search terms?
I also looked at the PredictExtractor, since "extracting predictions" sounds close to what I want... but the official documentation appears silent on how that class is supposed to be used. I thought TFTransformOutput sounded like a promising verb, but instead it's a noun.
I'm clearly missing something fundamental here. Is there a reason no one wants to store BulkInferrer results in a database? Is there a configuration option that allows me to write the results to a DB? Maybe I want to add a ParquetIO or BigQueryIO instance to the TFX pipeline? (TFX docs say it uses Beam "under the hood" but that doesn't say much about how I should use them together.) But the syntax in those documents looks sufficiently different from my TFX code that I'm not sure if they're compatible?
Help?
(Copied from the related issue for greater visibility)
After some digging, here is an alternative approach, which assumes no knowledge of the feature_spec before-hand. Do the following:
Set the BulkInferrer to write to output_examples rather than inference_result by adding a output_example_spec to the component construction.
Add a StatisticsGen and a SchemaGen component in the main pipeline right after the BulkInferrer to generate a schema for the aforementioned output_examples
Use the artifacts from SchemaGen and BulkInferrer to read the TFRecords and do whatever is neccessary.
bulk_inferrer = BulkInferrer(
....
output_example_spec=bulk_inferrer_pb2.OutputExampleSpec(
output_columns_spec=[bulk_inferrer_pb2.OutputColumnsSpec(
predict_output=bulk_inferrer_pb2.PredictOutput(
output_columns=[bulk_inferrer_pb2.PredictOutputCol(
output_key='original_label_name',
output_column='output_label_column_name', )]))]
))
statistics = StatisticsGen(
examples=bulk_inferrer.outputs.output_examples
)
schema = SchemaGen(
statistics=statistics.outputs.output,
)
After that, one can do the following:
import tensorflow as tf
from tfx.utils import io_utils
from tensorflow_transform.tf_metadata import schema_utils
# read schema from SchemaGen
schema_path = '/path/to/schemagen/schema.pbtxt'
schema_proto = io_utils.SchemaReader().read(schema_path)
spec = schema_utils.schema_as_feature_spec(schema_proto).feature_spec
# read inferred results
data_files = ['/path/to/bulkinferrer/output_examples/examples/examples-00000-of-00001.gz']
dataset = tf.data.TFRecordDataset(data_files, compression_type='GZIP')
# parse dataset with spec
def parse(raw_record):
return tf.io.parse_example(raw_record, spec)
dataset = dataset.map(parse)
At this point, the dataset is like any other parsed dataset, so its trivial to write a CSV, or to a BigQuery table or whatever from there. It certainly helped us in ZenML with our BatchInferencePipeline.
Answering my own question here to document what we did, even though I think #Hamza Tahir's answer below is objectively better. This may provide an option for other situations where it's necessary to change the operation of an out-of-the-box TFX component. It's hacky though:
We copied and edited the file tfx/components/bulk_inferrer/executor.py, replacing this transform in the _run_model_inference() method's internal pipeline:
| 'WritePredictionLogs' >> beam.io.WriteToTFRecord(
os.path.join(inference_result.uri, _PREDICTION_LOGS_FILE_NAME),
file_name_suffix='.gz',
coder=beam.coders.ProtoCoder(prediction_log_pb2.PredictionLog)))
with this one:
| 'WritePredictionLogsBigquery' >> beam.io.WriteToBigQuery(
'our_project:namespace.TableName',
schema='SCHEMA_AUTODETECT',
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
custom_gcs_temp_location='gs://our-storage-bucket/tmp',
temp_file_format='NEWLINE_DELIMITED_JSON',
ignore_insert_ids=True,
)
(This works because when you import the BulkInferrer component, the per-node work gets farmed out to these executors running on the worker nodes, and TFX copies its own library onto those nodes. It doesn't copy everything from user-space libaries, though, which is why we couldn't just subclass BulkInferrer and import our custom version.)
We had to make sure the table at 'our_project:namespace.TableName' had a schema compatible with the model's output, but didn't have to translate that schema into JSON / AVRO.
In theory, my group would like to make a pull request with TFX built around this, but for now we're hard-coding a couple key parameters, and don't have the time to get this to a real public / production state.
I'm a little late to this party but this is some code I use for this task:
import tensorflow as tf
from tensorflow_serving.apis import prediction_log_pb2
import pandas as pd
def parse_prediction_logs(inference_filenames: List[Text]): -> pd.DataFrame
"""
Args:
inference files: tf.io.gfile.glob(Inferrer artifact uri)
Returns:
a dataframe of userids, predictions, and features
"""
def parse_log(pbuf):
# parse the protobuf
message = prediction_log_pb2.PredictionLog()
message.ParseFromString(pbuf)
# my model produces scores and classes and I extract the topK classes
predictions = [x.decode() for x in (message
.predict_log
.response
.outputs['output_2']
.string_val
)[:10]]
# here I parse the input tf.train.Example proto
inputs = tf.train.Example()
inputs.ParseFromString(message
.predict_log
.request
.inputs['input_1'].string_val[0]
)
# you can pull out individual features like this
uid = inputs.features.feature["userId"].bytes_list.value[0].decode()
feature1 = [
x.decode() for x in inputs.features.feature["feature1"].bytes_list.value
]
feature2 = [
x.decode() for x in inputs.features.feature["feature2"].bytes_list.value
]
return (uid, predictions, feature1, feature2)
return pd.DataFrame(
[parse_log(x) for x in
tf.data.TFRecordDataset(inference_filenames, compression_type="GZIP").as_numpy_iterator()
], columns = ["userId", "predictions", "feature1", "feature2"]
)

JMeter - converting File to String to Array

I need to do the following:
Read a .csv file into a variable. Csv file is having one single row with a string like (110,111,112,113,114)
Using this String variable, split the content on the basis of a comma",".
What I have done:
I have added a Thread Group
2a. Added a user defined variable 'Config Element'.
2b. Added a variable named 'issueIds' having value ${__FileToString(D:\TestCasesId.csv,,issueIds)}
3a. Now I added a JSR223 Sampler with the following code:
String lineItems1 = ${issueIds};
log.info(lineItems1);
3b. Executing this give the following error:
Response code:500
Response message:javax.script.ScriptException: In file: inline evaluation of: ``String lineItems1 = 114660,114661,114662,114663; log.info(lineItems1); ;'' Encountered "114661" at line 1, column 28.
in inline evaluation of: ``String lineItems1 = 114660,114661,114662,114663; log.info(lineItems1); ;'' at line number 1
4a. Added a BeanShell Sampler with the following script:
String lineItems2 = ${issueIds};
String[] lineItems2Arr = lineItems2.split(",");
log.info(lineItems2);
log.info(lineItems2Arr[0]);
4b. Executing this give the following error:
Response code:500
Response message:org.apache.jorphan.util.JMeterException: Error invoking bsh method: eval In file: inline evaluation of: ``String lineItems2 = 114660,114661,114662,114663; String[] lineItems2Arr = lineIt . . . '' Encountered "114661" at line 1, column 28.
What am i doing wrong?
You are doing 2 things wrong:
Inlining JMeter Functions or Variables into scripting elements is not recommended, you should be using vars shorthand to JMeterVariables class instance instead like:
String lineItems1 = vars.get("issueIds");
Since JMeter 3.1 it's recommended to use JSR223 Test Elements and Groovy language for scripting therefore consider choosing groovy from the language drop-down
Groovy has much better performance comparing to Beanshell, it supports all modern Java SDK features and provides some syntax sugar on top of it, check out Apache Groovy - Why and How You Should Use It article for more details.
In case amount of comma separated fields is the same for all used csv files, you can consider using 'CSV Data Set Config' instead of manual splitting. In that case you will have a separate variable for each column in the csv, e.g.
id1,id2,id3,id4,id5
110,111,112,113,114

Use content of a tuple as variable session

I extracted from a previous response an Object of tuple with the following regex :
.check(regex(""""idSc":(.{1,8}),"pasTemps":."codePasTemps":(.),"""").ofType[(String,String)].findAll.saveAs ("OBJECTS1"))
So I get my object :
OBJECTS1 -> List((1657751,2), (1658105,2), (4557378,2), (1657750,1), (916,1), (917,2), (1658068,1), (1658069,2), (4557379,2), (1658082,1), (4557367,1), (4557368,1), (1660865,2), (1660866,2), (1658122,1), (921,1), (922,2), (923,2), (1660875,1), (1660876,2), (1660877,2), (1658300,1), (1658301,1), (1658302,1), (1658309,1), (1658310,1), (2996562,1), (4638455,1))
After that I did a Foreach and need to extract every couple to add them in next requests So we tried :
.foreach("${OBJECTS1}", "couple") {
exec(http("request_foreach47"
.get("/ctr/web/api/seriegraph/bydates/${couple(0)}/${couple(1)}/1552863600000/1554191743799")
.headers(headers_27))
}
But I get the message : named 'couple' does not support index access
I also though that to use 2 regex on the couple to extract both part could work but I haven't found any way to use a regex on a session variable. (Even if its not needed for this case but possible im really interessed to learn how as it could be usefull)
If would be really thankfull if you could provided me help. (Im using Gatling 2 but can,'t use a more recent version as its for work and others scripts have been develloped with Gatling2)
each "couple" is a scala tuple which can't be indexed into like a collection. Fortunately the gatling EL has a function that handles tuples.
so instead of
.get("/ctr/web/api/seriegraph/bydates/${couple(0)}/${couple(1)}/1552863600000/1554191743799")
you can use
.get("/ctr/web/api/seriegraph/bydates/${couple._1}/${couple._2}/1552863600000/1554191743799")

Confusion with the format of the http request header values?

According to rfc2616 in the section 3.11 the format of entity tag is the following:
entity-tag = [ weak ] opaque-tag
weak = "W/"
opaque-tag = quoted-string
And in the examples given for the condition "If-match" in section 14.24 in rfc2616 are the following:
If-Match: "xyzzy"
If-Match: "xyzzy", "r2d2xxxx", "c3piozzzz"
If-Match: *
I'm doing a project in c, where I'll parse the http requests from different clients. The web server is also written c, and from the webserver I can get the request headers and values as const char*and I parse them. But my confusion is that where the value in the header "If-match" will be similar to "xyzzy, r2d2xxxx ,c3piozzzz" or will it be similar to ""xyzzy", "r2d2xxxx", "c3piozzzz""? Do you know which one is right? And will there surely be space between each etags in the If-Match header value if it has a list of entities? I mean will the format be the following type?
If-Match: "one-entity-tag",[space]"second-entity-tag",[space]"third-entity-tag"
There is no description of the format of the If-Match header value if it has a list of etags. rfc2616 only gives an example(which I showed above) of it. Is that example reliable?
You can trust the spec and the examples: the double quote is really part of the ETag.

Dozer map Text to String

I'm using GWT and GAE for my project. I'm using data transfer objects and dozer to move data between client and server. Dozer had been working great, but I have some classes that need to store text that is over 500 characters, so I must use com.google.appengine.api.datastore.Text datatype in my server side object, but a regular String in my client side object. How do I map these two types using dozer? I know somehow I can specify an XML file, but how do I write that XML file?
specify a mapping between both the datatypes as below. Dozer will use it at run-time to convert.
<mapping>
<class-a>com.google.appengine.api.datastore.Text</class-a>
<class-b>java.lang.String</class-b>
</mapping>
In case you don't know how to use config file ,
In your code,
DozerMapper beanMapper = new DozerMapper();
beanMapper.mapping(new ArrayList<String>(){
{
add("name Of the dozer mapping file");
}
});

Resources