How to match input/output with sagemaker batch transform? - amazon-sagemaker

I'm using sagemaker batch transform, with json input files. see below for sample input/output files. i have custom inference code below, and i'm using json.dumps to return prediction, but it's not returning json. I tried to use => "DataProcessing": {"JoinSource": "string", }, to match input and output. but i'm getting error that "unable to marshall ..." . I think because , the output_fn is returning array of list or just list and not json , that is why it is unable to match input with output.any suggestions on how should i return the data?
infernce code
def model_fn(model_dir):
...
def input_fn(data, content_type):
...
def predict_fn(data, model):
...
def output_fn(prediction, accept):
if accept == "application/json":
return json.dumps(prediction), mimetype=accept)
raise RuntimeException("{} accept type is not supported by this script.".format(accept))
input file
{"data" : "input line one" }
{"data" : "input line two" }
....
output file
["output line one" ]
["output line two" ]
{
"BatchStrategy": SingleRecord,
"DataProcessing": {
"JoinSource": "string",
},
"MaxConcurrentTransforms": 3,
"MaxPayloadInMB": 6,
"ModelClientConfig": {
"InvocationsMaxRetries": 1,
"InvocationsTimeoutInSeconds": 3600
},
"ModelName": "some-model",
"TransformInput": {
"ContentType": "string",
"DataSource": {
"S3DataSource": {
"S3DataType": "string",
"S3Uri": "s3://bucket-sample"
}
},
"SplitType": "Line"
},
"TransformJobName": "transform-job"
}

json.dumps will not convert your array to a dict structure and serialize it to a JSON String.
What data type is prediction ? Have you tested making sure prediction is a dict?
You can confirm the data type by adding print(type(prediction)) to see the data type in the CloudWatch Logs.
If prediction is a list you can test the following:
def output_fn(prediction, accept):
if accept == "application/json":
my_dict = {'output': prediction}
return json.dumps(my_dict), mimetype=accept)
raise RuntimeException("{} accept type is not supported by this script.".format(accept))
DataProcessing and JoinSource are used to associate the data that is relevant to the prediction results in the output. It is not meant to be used to match the input and output format.

Related

To fetch elements and compare elements from JSON object in Ruby

I have many json resources similar to the below one. But, I need to only fetch the json resource which satisfies the two conditions:
(1) component.code.text == Diastolic Blood Pressure
(2) valueQuantity.value < 90
This is the JSON object/resource
{
"fullUrl": "urn:uuid:edf9439b-0173-b4ab-6545 3b100165832e",
"resource": {
"resourceType": "Observation",
"id": "edf9439b-0173-b4ab-6545-3b100165832e",
"component": [ {
"code": {
"coding": [ {
"system": "http://loinc.org",
"code": "8462-4",
"display": "Diastolic Blood Pressure"
} ],
"text": "Diastolic Blood Pressure"
},
"valueQuantity": {
"value": 81,
"unit": "mm[Hg]",
"system": "http://unitsofmeasure.org",
"code": "mm[Hg]"
}
}, {
"code": {
"coding": [ {
"system": "http://loinc.org",
"code": "8480-6",
"display": "Systolic Blood Pressure"
} ],
"text": "Systolic Blood Pressure"
},
"valueQuantity": {
"value": 120,
"unit": "mm[Hg]",
"system": "http://unitsofmeasure.org",
"code": "mm[Hg]"
}
} ]
},
}
JSON file
I need to write a condition to fetch the resource with text: "Diastolic Blood Pressure" AND valueQuantity.value > 90
I have written the following code:
def self.hypertension_observation(bundle)
entries = bundle.entry.select {|entry| entry.resource.is_a?(FHIR::Observation)}
observations = entries.map {|entry| entry.resource}
hypertension_observation_statuses = ((observations.select {|observation| observation&.component&.at(0)&.code&.text.to_s == 'Diastolic Blood Pressure'}) && (observations.select {|observation| observation&.component&.at(0)&.valueQuantity&.value.to_i >= 90}))
end
I am getting the output without any error. But, the second condition is not being satisfied in the output. The output contains even values < 90.
Please anyone help in correcting this ruby code regarding fetching only, output which contains value<90
I will write out what I would do for a problem like this, based on the (edited) version of your json data. I'm inferring that the full json file is some list of records with medical data, and that we want to fetch only the records for which the individual's diastolic blood pressure reading is < 90.
If you want to do this in Ruby I recommend using the JSON parser which comes with your ruby distro. What this does is it takes some (hopefully valid) json data and returns a Ruby array of hashes, each with nested arrays and hashes. In my solution I saved the json you posted to a file and so I would do something like this:
require 'json'
require 'pp'
json_data = File.read("medical_info.json")
parsed_data = JSON.parse(json_data)
fetched_data = []
parsed_data.map do |record|
diastolic_text = record["resource"]["component"][0]["code"]["text"]
diastolic_value_quantity = record["resource"]["component"][0]["valueQuantity"]["value"]
if diastolic_value_quantity < 90
fetched_data << record
end
end
pp fetched_data
This will print a new array of hashes which contains only the results with the desired values for diastolic pressure. The 'pp' gem is for 'Pretty Print' which isn't perfect but makes the hierarchy a little easier to read.
I find that when faced with deeply nested JSON data that I want to parse in Ruby, I will save the JSON data to a file, as I did here, and then in the directory where the file is, I run IRB so I can just play with accessing the hash values and array elements that I'm looking for.

Replace array values if they exist with jq

While I use jq a lot, I do so mostly for simpler tasks. This one has tied my brain into a knot.
I have some JSON output from unit tests which I need to modify. Specifically, I need to remove (or replace) an error value because the test framework generates output that is hundreds of lines long.
The JSON looks like this:
{
"numFailedTestSuites": 1,
"numFailedTests": 1,
"numPassedTestSuites": 1,
"numPassedTests": 1,
...
"testResults": [
{
"assertionResults": [
{
"failureMessages": [
"Error: error message here"
],
"status": "failed",
"title": "matches snapshot"
},
{
"failureMessages": [
"Error: another error message here",
"Error: yet another error message here"
],
"status": "failed",
"title": "matches another snapshot"
}
],
"endTime": 1617720396223,
"startTime": 1617720393320,
"status": "failed",
"summary": ""
},
{
"assertionResults": [
{
"failureMessages": [],
"status": "passed",
},
{
"failureMessages": [],
"status": "passed",
}
]
}
]
}
I want to replace each element in failureMessages with either a generic failed message or with a truncated version (let's say 100 characters) of itself.
The tricky part (for me) is that failureMessages is an array and can have 0-n values and I would need to modify all of them.
I know I can find non-empty arrays with select(.. | .failureMessages | length > 0) but that's as far as I got, because I don't need to actually select items, I need to replace them and get the full JSON back.
The simplest solution is:
.testResults[].assertionResults[].failureMessages[] |= .[0:100]
Check it online!
The online example keeps only the first 10 characters of the failure messages to show the effect on the sample JSON you posted in the question (it contains short error messages).
Read about array/string slice (.[a:b]) and update assignment (|=) in the JQ documentation.

Getting values from json array using an array of object and keys in Python

I'm a Python newbie and I'm trying to write a script to extract json keys by passing the keys dinamically, reading them from a csv.
First of all this is my first post and I'm sorry if my questions are banals and if the code is incomplete but it's just a pseudo code to understand the problem (I hope not to complicate it...)
The following partial code retrieves the values from three key (group, user and id or username) but I'd like to load the objects and key from a csv to make them dinamicals.
Input json
{
"fullname": "The Full Name",
"group": {
"user": {
"id": 1,
"username": "John Doe"
},
"location": {
"x": "1234567",
"y": "9876543"
}
},
"color": {
"code": "ffffff",
"type" : "plastic"
}
}
Python code...
...
url = urlopen(jsonFile)
data = json.loads(url.read())
id = (data["group"]["user"]["id"])
username = (data["group"]["user"]["username"])
...
File.csv loaded into an array. Each line contains one or more keys.
fullname;
group,user,id;
group,user,username;
group,location,x;
group,location,y;
color,code;
The questions are: can I use a variable containing the object or key to be extract?
And how can I specify how many keys there are in the keys array to put them into the data([ ][ ]...) using only one line?
Something like this pseudo code:
...
url = urlopen(jsonFile)
data = json.loads(url.read())
...
keys = line.split(',')
...
# using keys[] to identify the objects and keys
value = (data[keys[0]][keys[1]][keys[2]])
...
But the line value = (data[keys[0]][keys[1]][keys[2]]) should have the exact number of the keys per line read from the csv.
Or I must to make some "if" lines like these?:
...
if len(keys) == 3:
value = (data[keys[0]][keys[1]][keys[2]])
if len(keys) == 2:
value = (data[keys[0]][keys[1]])
...
Many thanks!
I'm not sure I completely understand your question, but I would suggest you to try and play with pandas. It might be as easy as this:
import pandas as pd
df = pd.read_json(<yourJsonFile>, orient='columns')
name = df.fullname[0]
group_user = df.group.user
group_location = df.group.location
color_type = df.color.type
color_code = df.color.code
(Where group_user and group_location will be python dictionaries).

How to extract values out of a array using JSON Extractor in jmeter?

I want to extract below json and use values accordingly.
I/p JSON:-
{
"status": "Success",
"message": "User created successfully",
"id": [
131188,
131191
]
}
Here I want values of id field. I used JSON Extractor and gave expression as $.id which gives me [131188,131191] in a variable. Now I want to use individual values out of this array i.e. 131188 and 131191.
Any Idea how to do it?
Update : I don't want to use 2 JSON Extractors.
Just add [*] to your JSON path expression as below
$.id[*]
This will create a jmeter variable for each value.Note that you should use -1 in the match numbers field.
You could use a json extractor and a "JSR223 PostProcessor" with groovy language. An example:
import groovy.json.JsonSlurper
//String jsonString = vars.get("jsonFromExtractor")
String jsonString = '''
{
"status": "Success",
"message": "User created successfully",
"id": [
131188,
131191
]
}
'''
log.info("jsonString:" + jsonString)
def json = new JsonSlurper().parseText( jsonString )
String idValue1 = json.get("id").get(0)
String idValue2 = json.get("id").get(1)
log.info("idValue1:" + idValue1)
log.info("idValue2:" + idValue2)
I hope this helps

sagemaker clienterror rows 1-5000 have more fields than expected size 3

I have created a K-means training job with a csv file that I have stored in S3. After a while I receive the following error:
Training failed with the following error: ClientError: Rows 1-5000 in file /opt/ml/input/data/train/features have more fields than than expected size 3.
What could be the issue with my file?
Here are the parameters I am passing to sagemaker.create_training_job
TrainingJobName=job_name,
HyperParameters={
'k': '2',
'feature_dim': '2'
},
AlgorithmSpecification={
'TrainingImage': image,
'TrainingInputMode': 'File'
},
RoleArn='arn:aws:iam::<my_acc_number>:role/MyRole',
OutputDataConfig={
"S3OutputPath": output_location
},
ResourceConfig={
'InstanceType': 'ml.m4.xlarge',
'InstanceCount': 1,
'VolumeSizeInGB': 20,
},
InputDataConfig=[
{
'ChannelName': 'train',
'ContentType': 'text/csv',
"CompressionType": "None",
"RecordWrapperType": "None",
'DataSource': {
'S3DataSource': {
'S3DataType': 'S3Prefix',
'S3Uri': data_location,
'S3DataDistributionType': 'FullyReplicated'
}
}
}
],
StoppingCondition={
'MaxRuntimeInSeconds': 600
}
I've seen this issue appear when doing unsupervised learning, such as the above example using clustering. If you have a csv input, you can also address this issue by setting label_size=0 in the ContentType parameter of the Sagemaker API call, within the InputDataConfig branch.
Here's an example of what the relevant section of the call might look like:
"InputDataConfig": [
{
"ChannelName": "train",
"DataSource": {
"S3DataSource": {
"S3DataType": "S3Prefix",
"S3Uri": "some/path/in/s3",
"S3DataDistributionType": "ShardedByS3Key"
}
},
"CompressionType": "None",
"RecordWrapperType": "None",
"ContentType": "text/csv;label_size=0"
}
]
Make sure your .csv doesn't have column headers, and that the label is the first column.
Also make sure your values for the hyper-parameters are accurate ie feature_dim means number of features in your set. If you give it the wrong value, it'll break.
Heres a list of sagemaker knn hyper-parameters and their meanings: https://docs.aws.amazon.com/sagemaker/latest/dg/kNN_hyperparameters.html

Resources