I retrain a mobilenet v2 modell using my own images and i can label new images with the output in python (https://www.tensorflow.org/hub/tutorials/image_retraining). Loading the file works, but during prediction it fails with (concole.log of Firefox and Chromium):
The dict provided in model.execute(dict) has keys: [images] not part of model graph.
I retrain a modell using the provided retrain.py
python retrain.py --image_dir flower_photos/ --tfhub_module https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/classification/2 --random_brightness 10 --how_many_training_steps 100
inside flower_photos there are folders with the name of the images and inside the appropriate images.
flower_photos
--- Huflattich
------- 1.jpg
------- 2.jpg
....
--- Buschwindröschen
------- 1.jpg
------- 2.jpg
I can convert this model using
tensorflowjs_converter --input_format=tf_frozen_model --output_node_names='module_apply_default/MobilenetV2/Logits/output' /tmp/output_graph.pb Mobilenetv2/web_model
but this isn't working inside the provided example from https://github.com/tensorflow/tfjs-examples/tree/master/mobilenet
If i convert the original mobilenet v2 using
tensorflowjs_converter --input_format=tf_hub 'https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/classification/2' mobilenetv2/web_model
i can load inside the provided example.
In the end, the programme should detect different early bloomer flowers shown by the webcam and classify. This should be a PWA for students and motivate them to experience nature.
Tensorflow.js currently has two types of models,
Layers model which allows training, you can load them with tf.loadModel(...)
Models that are converted from TensorFlow generated models, which does not allow training. This is what you have, you should use tf.loadFrozenModel(...)
here is an example for loading the frozen model and performing a prediction on an image. https://github.com/tensorflow/tfjs-converter/tree/master/demo/mobilenet
Related
I want to train yolov5 by combining the coco dataset and the custom dataset created with roboflow. How do I merge datasets?
can I ask why you’re looking to combine the two?
Are you just wanting to do Transfer Learning to accelerate your model training and inference performance? If that’s the case, you can just use Train From Checkpoint, with Roboflow Train, and use the COCO checkpoint - https://docs.roboflow.com/train
Otherwise, is your goal to detect your custom classes alongside all of the classes in COCO?
Create a data configuration file combined_datasets.yaml that combines multiple datasets like this:
path: ../../yolov5_datasets # realative data root dir
train: # train images (relative to 'path')
- coco_dataset/train/images # use both coco
- custom_dataset/train/images # and you custom dataset for train
val: # val images
- coco_dataset/val/images # use both coco
- custom_dataset/val/images # and you custom dataset for eval
# Classes
nc: N # number of classes
names: [ 'name_0', 'name_1', '...', 'name_N-1' ] # class names
Specify it for training:
python train.py --data combined_datasets.yaml --cfg yolov5s.yaml --weights yolov5s.pt --device 2 --img 320
Let's say I have successfully trained a model on some training data for 10 epochs. How can I then access the very same model and train for a further 10 epochs?
In the docs it suggests "you need to specify a checkpoint output path through hyperparameters" --> how?
# define my estimator the standard way
huggingface_estimator = HuggingFace(
entry_point='train.py',
source_dir='./scripts',
instance_type='ml.p3.2xlarge',
instance_count=1,
role=role,
transformers_version='4.10',
pytorch_version='1.9',
py_version='py38',
hyperparameters = hyperparameters,
metric_definitions=metric_definitions
)
# train the model
huggingface_estimator.fit(
{'train': training_input_path, 'test': test_input_path}
)
If I run huggingface_estimator.fit again it will just start the whole thing over again and overwrite my previous training.
You can find the relevant checkpoint save/load code in Spot Instances - Amazon SageMaker x Hugging Face Transformers.
(The example enables Spot instances, but you can use on-demand).
In hyperparameters you set: 'output_dir':'/opt/ml/checkpoints'.
You define a checkpoint_s3_uri in the Estimator (which is unique to the series of jobs you'll run).
You add code for train.py to support checkpointing:
from transformers.trainer_utils import get_last_checkpoint
# check if checkpoint existing if so continue training
if get_last_checkpoint(args.output_dir) is not None:
logger.info("***** continue training *****")
last_checkpoint = get_last_checkpoint(args.output_dir)
trainer.train(resume_from_checkpoint=last_checkpoint)
else:
trainer.train()
I'm very new to TFX, but have an apparently-working ML Pipeline which is to be used via BulkInferrer. That seems to produce output exclusively in Protobuf format, but since I'm running bulk inference I want to pipe the results to a database instead. (DB output seems like it should be the default for bulk inference, since both Bulk Inference & DB access take advantage of parallelization... but Protobuf is a per-record, serialized format.)
I assume I could use something like Parquet-Avro-Protobuf to do the conversion (though that's in Java and the rest of the pipeline's in Python), or I could write something myself to consume all the protobuf messages one-by-one, convert them into JSON, deserialize the JSON into a list of dicts, and load the dict into a Pandas DataFrame, or store it as a bunch of key-value pairs which I treat like a single-use DB... but that sounds like a lot of work and pain involving parallelization and optimization for a very common use case. The top-level Protobuf message definition is Tensorflow's PredictionLog.
This must be a common use case, because TensorFlowModelAnalytics functions like this one consume Pandas DataFrames. I'd rather be able to write directly to a DB (preferably Google BigQuery), or a Parquet file (since Parquet / Spark seems to parallelize better than Pandas), and again, those seem like they should be common use cases, but I haven't found any examples. Maybe I'm using the wrong search terms?
I also looked at the PredictExtractor, since "extracting predictions" sounds close to what I want... but the official documentation appears silent on how that class is supposed to be used. I thought TFTransformOutput sounded like a promising verb, but instead it's a noun.
I'm clearly missing something fundamental here. Is there a reason no one wants to store BulkInferrer results in a database? Is there a configuration option that allows me to write the results to a DB? Maybe I want to add a ParquetIO or BigQueryIO instance to the TFX pipeline? (TFX docs say it uses Beam "under the hood" but that doesn't say much about how I should use them together.) But the syntax in those documents looks sufficiently different from my TFX code that I'm not sure if they're compatible?
Help?
(Copied from the related issue for greater visibility)
After some digging, here is an alternative approach, which assumes no knowledge of the feature_spec before-hand. Do the following:
Set the BulkInferrer to write to output_examples rather than inference_result by adding a output_example_spec to the component construction.
Add a StatisticsGen and a SchemaGen component in the main pipeline right after the BulkInferrer to generate a schema for the aforementioned output_examples
Use the artifacts from SchemaGen and BulkInferrer to read the TFRecords and do whatever is neccessary.
bulk_inferrer = BulkInferrer(
....
output_example_spec=bulk_inferrer_pb2.OutputExampleSpec(
output_columns_spec=[bulk_inferrer_pb2.OutputColumnsSpec(
predict_output=bulk_inferrer_pb2.PredictOutput(
output_columns=[bulk_inferrer_pb2.PredictOutputCol(
output_key='original_label_name',
output_column='output_label_column_name', )]))]
))
statistics = StatisticsGen(
examples=bulk_inferrer.outputs.output_examples
)
schema = SchemaGen(
statistics=statistics.outputs.output,
)
After that, one can do the following:
import tensorflow as tf
from tfx.utils import io_utils
from tensorflow_transform.tf_metadata import schema_utils
# read schema from SchemaGen
schema_path = '/path/to/schemagen/schema.pbtxt'
schema_proto = io_utils.SchemaReader().read(schema_path)
spec = schema_utils.schema_as_feature_spec(schema_proto).feature_spec
# read inferred results
data_files = ['/path/to/bulkinferrer/output_examples/examples/examples-00000-of-00001.gz']
dataset = tf.data.TFRecordDataset(data_files, compression_type='GZIP')
# parse dataset with spec
def parse(raw_record):
return tf.io.parse_example(raw_record, spec)
dataset = dataset.map(parse)
At this point, the dataset is like any other parsed dataset, so its trivial to write a CSV, or to a BigQuery table or whatever from there. It certainly helped us in ZenML with our BatchInferencePipeline.
Answering my own question here to document what we did, even though I think #Hamza Tahir's answer below is objectively better. This may provide an option for other situations where it's necessary to change the operation of an out-of-the-box TFX component. It's hacky though:
We copied and edited the file tfx/components/bulk_inferrer/executor.py, replacing this transform in the _run_model_inference() method's internal pipeline:
| 'WritePredictionLogs' >> beam.io.WriteToTFRecord(
os.path.join(inference_result.uri, _PREDICTION_LOGS_FILE_NAME),
file_name_suffix='.gz',
coder=beam.coders.ProtoCoder(prediction_log_pb2.PredictionLog)))
with this one:
| 'WritePredictionLogsBigquery' >> beam.io.WriteToBigQuery(
'our_project:namespace.TableName',
schema='SCHEMA_AUTODETECT',
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
custom_gcs_temp_location='gs://our-storage-bucket/tmp',
temp_file_format='NEWLINE_DELIMITED_JSON',
ignore_insert_ids=True,
)
(This works because when you import the BulkInferrer component, the per-node work gets farmed out to these executors running on the worker nodes, and TFX copies its own library onto those nodes. It doesn't copy everything from user-space libaries, though, which is why we couldn't just subclass BulkInferrer and import our custom version.)
We had to make sure the table at 'our_project:namespace.TableName' had a schema compatible with the model's output, but didn't have to translate that schema into JSON / AVRO.
In theory, my group would like to make a pull request with TFX built around this, but for now we're hard-coding a couple key parameters, and don't have the time to get this to a real public / production state.
I'm a little late to this party but this is some code I use for this task:
import tensorflow as tf
from tensorflow_serving.apis import prediction_log_pb2
import pandas as pd
def parse_prediction_logs(inference_filenames: List[Text]): -> pd.DataFrame
"""
Args:
inference files: tf.io.gfile.glob(Inferrer artifact uri)
Returns:
a dataframe of userids, predictions, and features
"""
def parse_log(pbuf):
# parse the protobuf
message = prediction_log_pb2.PredictionLog()
message.ParseFromString(pbuf)
# my model produces scores and classes and I extract the topK classes
predictions = [x.decode() for x in (message
.predict_log
.response
.outputs['output_2']
.string_val
)[:10]]
# here I parse the input tf.train.Example proto
inputs = tf.train.Example()
inputs.ParseFromString(message
.predict_log
.request
.inputs['input_1'].string_val[0]
)
# you can pull out individual features like this
uid = inputs.features.feature["userId"].bytes_list.value[0].decode()
feature1 = [
x.decode() for x in inputs.features.feature["feature1"].bytes_list.value
]
feature2 = [
x.decode() for x in inputs.features.feature["feature2"].bytes_list.value
]
return (uid, predictions, feature1, feature2)
return pd.DataFrame(
[parse_log(x) for x in
tf.data.TFRecordDataset(inference_filenames, compression_type="GZIP").as_numpy_iterator()
], columns = ["userId", "predictions", "feature1", "feature2"]
)
I am new to swift but I have made an android app where a string array is selected from an xml file. This is a large xml file that contains a lot of string arrays and the app gets the relevant string array based on a user selection.
I am now trying to develop the same app for iOS using swift. I would like to use the same xml file but I can not see and easy way to get the correct array. For example, part of the xml looks like this
<string-array name="OCR_Businessstudies_A_Topics">
<item>1. Business objectives and strategic decisions</item>
<item>2. External influences facing businesses</item>
<item>3. Marketing and marketing strategies</item>
<item>4. Operational strategy</item>
<item>5. Human resources</item>
<item>6. Accounting and financial considerations</item>
<item>7. The global environment of business</item>
</string-array>
<string-array name="OCR_Businessstudies_AS_Topics">
<item>1. Business objectives and strategic decisions</item>
<item>2. External influences facing businesses</item>
<item>3. Marketing and marketing strategies</item>
<item>4. Operational strategy</item>
<item>5. Human resources</item>
<item>6. Accounting and financial considerations</item>
</string-array>
If I have the string "OCR_Businessstudies_A_Topics" how do i get the "OCR_Businessstudies_A_Topics" array from the xml file.
This is very straight forward in android and although I have used online tutorials for swift it seems like I have to parse the xml file but do not seem to be getting anywhere.
Is there a better approach than trying to parse the whole xml fie?
Thanks
Barry
You can write your own XML parser, conforming to NSXMLParser or use a library like HTMLReader:
let fileURL = NSBundle.mainBundle().URLForResource("data", withExtension: "xml")!
let xmlData = NSData(contentsOfURL: fileURL)!
let topic = "OCR_Businessstudies_A_Topics"
let document = HTMLDocument(data: xmlData, contentTypeHeader: "text/xml")
for item in document.nodesMatchingSelector("string-array[name='\(topic)'] item") {
print(item.textContent)
}
I'm working on a simple rails app that does SMS. I am leveraging Twilio for this via the twilio_ruby gem. I have 10 different phone numbers that I want to be able to send SMS from randomly.
I know if I do something like this:
numbers = ["281-555-1212", "821-442-2222", "810-440-2293"]
numbers.sample
281-555-1212
It will randomly pull one of the values from the array, which is exactly what I want. The problem is I don't want to hardcode all 10 of these numbers into the app or commit them to version control.
So I'm listing them in yaml (secrets.yml) along with my Twilio SID/Token. How can I build an array out of the 10 yaml fields i.e. twilio_num_1, twilio_num_2, etc, etc so that I can call numbers.sample?
Or is there a better way to do this?
You can also use
twilio_numbers:
- 281-555-1122
- 817-444-2222
- 802-333-2222
thus you don't have to write the numbers in one line.
Figured this out through trial and error.
In secrets.yml
twilio_numbers: ["281-555-1122","817-444-2222","802-333-2222"]
In my code:
Rails.application.secrets.twilio_numbers.sample
Works like a charm.
create a file: config/twilio_numbers.yml
---
- 281-555-1122
- 817-444-2222
- 802-333-2222
and load it in your config/application.rb like this:
config.twilio_numbers = YAML.load_file 'config/twilio_numbers.yml'
you can then access the array from inside any file like this:
Rails.application.config.twilio_numbers
=> ["281-555-1122", "817-444-2222", "802-333-2222"]