How to use multiple GPUs for training? - distributed

I am simply trying to understand how to format a config file to allow multiple GPUs/distributed training to take place via the "train" command.
The only clear tutorial out there is seemingly for much older versions of AllenNLP: Tutorial: How to train with multiple GPUs in AllenNLP and does not work as the "distributed" argument is now a Bool and will not accept a list of CUDA device IDs.
"trainer": {
// Set use_amp to true to use automatic mixed-precision during training (if your GPU supports it)
"use_amp": true,
"cuda_devices": [7,8],
"optimizer": {
"type": "huggingface_adamw",
"lr": 5e-5,
"eps": 1e-06,
"correct_bias": false,
"weight_decay": 0.1,
"parameter_groups": [
// Apply weight decay to pre-trained params, excluding LayerNorm params and biases
[["bias", "LayerNorm\\.weight", "layer_norm\\.weight"], {"weight_decay": 0}],
],
},
"callbacks":[{"type":'tensorboard'}],
"num_epochs": 10,
"checkpointer": {
// A value of null or -1 will save the weights of the model at the end of every epoch
"keep_most_recent_by_count": 2,
},
"grad_norm": 1.0,
"learning_rate_scheduler": {
"type": "slanted_triangular",
},
"distributed": {"cuda_devices": [7,8],},
"world_size": 2,
},
}
Leads to:
kwargs = create_kwargs(constructor_to_inspect, cls, params, **extras)
File "/home/niallt/DeCLUTR/allennlp/allennlp/common/from_params.py", line 206, in create_kwargs
constructed_arg = pop_and_construct_arg(
File "/home/niallt/DeCLUTR/allennlp/allennlp/common/from_params.py", line 314, in pop_and_construct_arg
return construct_arg(class_name, name, popped_params, annotation, default, **extras)
File "/home/niallt/DeCLUTR/allennlp/allennlp/common/from_params.py", line 363, in construct_arg
raise TypeError(f"Expected {argument_name} to be a {annotation.__name__}.")
TypeError: Expected distributed to be a bool.
Then trying to shift towards allennlp v2.10 by setting distributed to bool and providing the cuda_devices as a list leads to the following:
"trainer": {
// Set use_amp to true to use automatic mixed-precision during training (if your GPU supports it)
"use_amp": true,
"cuda_devices": [7,8],
"optimizer": {
"type": "huggingface_adamw",
"lr": 5e-5,
"eps": 1e-06,
"correct_bias": false,
"weight_decay": 0.1,
"parameter_groups": [
// Apply weight decay to pre-trained params, excluding LayerNorm params and biases
[["bias", "LayerNorm\\.weight", "layer_norm\\.weight"], {"weight_decay": 0}],
],
},
"callbacks":[{"type":'tensorboard'}],
"num_epochs": 10,
"checkpointer": {
// A value of null or -1 will save the weights of the model at the end of every epoch
"keep_most_recent_by_count": 2,
},
"grad_norm": 1.0,
"learning_rate_scheduler": {
"type": "slanted_triangular",
},
"distributed": true,
"world_size": 2
},
}
With the following error:
File "/home/niallt/DeCLUTR/allennlp/allennlp/commands/train.py", line 786, in from_partial_objects
trainer_ = trainer.construct(
File "/home/niallt/DeCLUTR/allennlp/allennlp/common/lazy.py", line 82, in construct
return self.constructor(**contructor_kwargs)
File "/home/niallt/DeCLUTR/allennlp/allennlp/common/lazy.py", line 66, in constructor_to_use
return self._constructor.from_params( # type: ignore[union-attr]
File "/home/niallt/DeCLUTR/allennlp/allennlp/common/from_params.py", line 604, in from_params
return retyped_subclass.from_params(
File "/home/niallt/DeCLUTR/allennlp/allennlp/common/from_params.py", line 638, in from_params
return constructor_to_call(**kwargs) # type: ignore
File "/home/niallt/DeCLUTR/allennlp/allennlp/training/gradient_descent_trainer.py", line 1154, in from_partial_objects
ddp_accelerator = TorchDdpAccelerator(cuda_device=cuda_device)
File "/home/niallt/DeCLUTR/allennlp/allennlp/nn/parallel/ddp_accelerator.py", line 138, in __init__
super().__init__(local_rank=local_rank, world_size=world_size, cuda_device=cuda_device)
File "/home/niallt/DeCLUTR/allennlp/allennlp/nn/parallel/ddp_accelerator.py", line 102, in __init__
self.local_rank: int = local_rank if local_rank is not None else dist.get_rank()
File "/home/niallt/venvs/39_declutr/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 844, in get_rank
default_pg = _get_default_group()
File "/home/niallt/venvs/39_declutr/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 429, in _get_default_group
raise RuntimeError(
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
I'm guessing I may just be missing some key arguments here - but struggling to determine what.
Any help would be much appreciated

Related

How to create a Tensorflow Dataset without labels? Input 'filename' of 'ReadFile' Op has type float32 that does not match expected type of string

Using Tensorflow 2.3, I'm trying to create a tf.data.Dataset without labels.
I have my .png files in a folder './Folder/'. For creating the minimal working sample, I think the only relevant line is the one where I am calling tf.keras.preprocessing.image_dataset_from_directory. The class definition is here.
dataset = tf.keras.preprocessing.image_dataset_from_directory('./Folder/',label_mode=None,batch_size=100)
When the Python interpreter reaches the line above, it returns this error message:
Traceback (most recent call last):
File "/home/roi/.local/lib/python3.8/site-packages/tensorflow/python/framework/op_def_library.py", line 465, in _apply_op_helper
values = ops.convert_to_tensor(
File "/home/roi/.local/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1473, in convert_to_tensor
raise ValueError(
ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: <tf.Tensor 'args_0:0' shape=() dtype=float32>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "04-vaeAnomalyScores.py", line 135, in <module>
historicKLD, encoder, decoder, vae = artVAE_Instance.run_autoencoder() # Train
File "/media/roi/9b168630-3b62-4215-bb7d-fed9ba179dc7/images/largePatches/artvae.py", line 386, in run_autoencoder
trainingDataSet = self.loadImages(self.trainingDir)
File "/media/roi/9b168630-3b62-4215-bb7d-fed9ba179dc7/images/largePatches/artvae.py", line 231, in loadImages
dataset = tf.keras.preprocessing.image_dataset_from_directory(dir[:-1]+'Downscaled/',label_mode=None,batch_size=self.BATCH_SIZE)
File "/home/roi/.local/lib/python3.8/site-packages/tensorflow/python/keras/preprocessing/image_dataset.py", line 192, in image_dataset_from_directory
dataset = paths_and_labels_to_dataset(
File "/home/roi/.local/lib/python3.8/site-packages/tensorflow/python/keras/preprocessing/image_dataset.py", line 219, in paths_and_labels_to_dataset
img_ds = path_ds.map(
File "/home/roi/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1695, in map
return MapDataset(self, map_func, preserve_cardinality=True)
File "/home/roi/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 4041, in __init__
self._map_func = StructuredFunctionWrapper(
File "/home/roi/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3371, in __init__
self._function = wrapper_fn.get_concrete_function()
File "/home/roi/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2938, in get_concrete_function
graph_function = self._get_concrete_function_garbage_collected(
File "/home/roi/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2906, in _get_concrete_function_garbage_collected
graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
File "/home/roi/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3213, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/home/roi/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3065, in _create_graph_function
func_graph_module.func_graph_from_py_func(
File "/home/roi/.local/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 986, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/home/roi/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3364, in wrapper_fn
ret = _wrapper_helper(*args)
File "/home/roi/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3299, in _wrapper_helper
ret = autograph.tf_convert(func, ag_ctx)(*nested_args)
File "/home/roi/.local/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 255, in wrapper
return converted_call(f, args, kwargs, options=options)
File "/home/roi/.local/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 532, in converted_call
return _call_unconverted(f, args, kwargs, options)
File "/home/roi/.local/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 339, in _call_unconverted
return f(*args, **kwargs)
File "/home/roi/.local/lib/python3.8/site-packages/tensorflow/python/keras/preprocessing/image_dataset.py", line 220, in <lambda>
lambda x: path_to_image(x, image_size, num_channels, interpolation))
File "/home/roi/.local/lib/python3.8/site-packages/tensorflow/python/keras/preprocessing/image_dataset.py", line 228, in path_to_image
img = io_ops.read_file(path)
File "/home/roi/.local/lib/python3.8/site-packages/tensorflow/python/ops/gen_io_ops.py", line 574, in read_file
_, _, _op, _outputs = _op_def_library._apply_op_helper(
File "/home/roi/.local/lib/python3.8/site-packages/tensorflow/python/framework/op_def_library.py", line 492, in _apply_op_helper
raise TypeError("%s expected type of %s." %
TypeError: Input 'filename' of 'ReadFile' Op has type float32 that does not match expected type of string.
Thank you so much for your help.
One way to fix this I found is to put all your images in another sub-directory inside the directory whose path you are feeding to the image_dataset_from_directory.
Taking your example, you would create a new folder, let's call it new_folder, inside of ./Folder/ where you would put all your images, such that now the path to all your images is ./Folder/new_folder/. Then you can call the image_dataset_from_directory method with the exact same arguments as you have done in your question:
tf.keras.preprocessing.image_dataset_from_directory(
'./Folder/',
label_mode=None,
batch_size=100
)
I found this to work for me so hopefully someone else will also find it helpful!

Datastore error: BadValueError: Expected integer, got [0, 1, 2, 3]

Others have reported a similar error, but the solutions given do not solve my problem.
For example there is a good answer here. The answer in the link mentions how ndb changes from a first use to a later use and suggests there is a problem because a first run produces a None in the Datastore. I cannot reproduce or see that happening in the Datastore for my sdk, but that may be because I am running it here from the interactive console.
I am pretty sure I got an initial good run with the GAE interactive console, but every run since then has failed with the error in my Title to this question.
I have left the print statements in the following code because they show good results and assure me that the error is occuring in the put() at the very end.
from google.appengine.ext import ndb
class Account(ndb.Model):
week = ndb.IntegerProperty(repeated=True)
weeksNS = ndb.IntegerProperty(repeated=True)
weeksEW = ndb.IntegerProperty(repeated=True)
terry=Account(week=[],weeksNS=[],weeksEW=[])
terry_key=terry.put()
terry = terry_key.get()
print terry
for t in list(range(4)): #just dummy input, but like real input
terry.week.append(t)
print terry.week
region = 1 #same error message for region = 0
if region :
terry.weeksEW.append(terry.week)
else:
terry.weeksNS.append(terry.week)
print 'EW'+str(terry.weeksEW)
print 'NS'+str(terry.weeksNS)
terry.week = []
print 'week'+str(terry.week)
terry.put()
The idea of my code is to first build up the terry.week list values incrementally and then later store the whole list to the appropriate region, either NS or EW. So I'm looking for a workaround for this scheme.
The error message is likely of no value but I am reproducing it here.
Traceback (most recent call last):
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/tools/devappserver2/python/runtime/request_handler.py", line 237, in handle_interactive_request
exec(compiled_code, self._command_globals)
File "<string>", line 55, in <module>
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 3458, in _put
return self._put_async(**ctx_options).get_result()
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/tasklets.py", line 383, in get_result
self.check_success()
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/tasklets.py", line 427, in _help_tasklet_along
value = gen.throw(exc.__class__, exc, tb)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/context.py", line 824, in put
key = yield self._put_batcher.add(entity, options)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/tasklets.py", line 430, in _help_tasklet_along
value = gen.send(val)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/context.py", line 358, in _put_tasklet
keys = yield self._conn.async_put(options, datastore_entities)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/datastore/datastore_rpc.py", line 1858, in async_put
pbs = [entity_to_pb(entity) for entity in entities]
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 697, in entity_to_pb
pb = ent._to_pb()
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 3167, in _to_pb
prop._serialize(self, pb, projection=self._projection)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 1422, in _serialize
values = self._get_base_value_unwrapped_as_list(entity)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 1192, in _get_base_value_unwrapped_as_list
wrapped = self._get_base_value(entity)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 1180, in _get_base_value
return self._apply_to_values(entity, self._opt_call_to_base_type)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 1352, in _apply_to_values
value[:] = map(function, value)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 1234, in _opt_call_to_base_type
value = _BaseValue(self._call_to_base_type(value))
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 1255, in _call_to_base_type
return call(value)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 1331, in call
newvalue = method(self, value)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 1781, in _validate
(value,))
BadValueError: Expected integer, got [0, 1, 2, 3]
I believe the error comes from these lines:
terry.weeksEW.append(terry.week)
terry.weeksNS.append(terry.week)
You are not appending another integer; You are appending a list, when an integer is expected.
>>> aaa = [1,2,3]
>>> bbb = [4,5,6]
>>> aaa.append(bbb)
>>> aaa
[1, 2, 3, [4, 5, 6]]
>>>
This fails the ndb.IntegerProperty test.
Try:
terry.weeksEW += terry.week
terry.weeksNS += terry.week
EDIT: To save a list of lists, do not use the IntegerProperty(), but instead the JsonProperty(). Better still, the ndb datastore is deprecated, so... I recommend Firestore, which uses JSON objects by default. At least use Cloud Datastore, or Cloud NDB.

Formatting into CSV JSON file using jq

I've some data in a file called myfile.json. I need to format using jq - in JSON it looks like this ;
{
"result": [
{
"service": "ebsvolume",
"name": "gtest",
"resourceIdentifier": "vol-999999999999",
"accountName": "g-test-acct",
"vendorAccountId": "12345678912",
"availabilityZone": "ap-southeast-2c",
"region": "ap-southeast-2",
"effectiveHourly": 998.56,
"totalSpend": 167.7,
"idle": 0,
"lastSeen": "2018-08-16T22:00:00Z",
"volumeType": "io1",
"state": "in-use",
"volumeSize": 180,
"iops": 2000,
"throughput": 500,
"lastAttachedTime": "2018-08-08T22:00:00Z",
"lastAttachedId": "i-086f957ee",
"recommendations": [
{
"action": "Rightsize",
"preferenceOrder": 2,
"risk": 0,
"savingsPct": 91,
"savings": 189.05,
"volumeType": "gp2",
"volumeSize": 120,
},
{
"action": "Rightsize",
"preferenceOrder": 4,
"risk": 0,
"savingsPct": 97,
"savings": 166.23,
"volumeType": "gp2",
"volumeSize": 167,
},
{
"action": "Rightsize",
"preferenceOrder": 6,
"risk": 0,
"savingsPct": 91,
"savings": 111.77,
"volumeType": "gp2",
"volumeSize": 169,
}
]
}
}
I have it formatted better with the following
jq '.result[] | [.service,.name,.resourceIdentifier,.accountName,.vendorAccountId,.availabilityZone,.region,.effectiveHourly,.totalSpend,.idle,.lastSeen,.volumeType,.state,.volumeSize,.iops,.throughput,.lastAttachedTime,.lastAttachedId] |#csv' ./myfile.json
This nets the following output ;
"\"ebsvolume\",\"gtest\",\"vol-999999999999\",\"g-test-acct\",\"12345678912\",\"ap-southeast-2c\",\"ap-southeast-2\",998.56,167.7,0,\"2018-08-16T22:00:00Z\",\"io1\",\"in-use\",180,2000,500,\"2018-08-08T22:00:00Z\",\"i-086f957ee\""
I figured out this but its not exactly what I am trying to achieve. I want to have each recommendation listed underneath on a seperate line, and not at the end of the same line.
jq '.result[] | [.service,.name,.resourceIdentifier,.accountName,.vendorAccountId,.availabilityZone,.region,.effectiveHourly,.totalSpend,.idle,.lastSeen,.volumeType,.state,.volumeSize,.iops,.throughput,.lastAttachedTime,.lastAttachedId,.recommendations[].action] |#csv' ./myfile.json
This nets :
"\"ebsvolume\",\"gtest\",\"vol-999999999999\",\"g-test-acct\",\"12345678912\",\"ap-southeast-2c\",\"ap-southeast-2\",998.56,167.7,0,\"2018-08-16T22:00:00Z\",\"io1\",\"in-use\",180,2000,500,\"2018-08-08T22:00:00Z\",\"i-086f957ee\",\"Rightsize\",\"Rightsize\",\"Rightsize\""
What I want is
"\"ebsvolume\",\"gtest\",\"vol-999999999999\",\"g-test-acct\",\"12345678912\",\"ap-southeast-2c\",\"ap-southeast-2\",998.56,167.7,0,\"2018-08-16T22:00:00Z\",\"io1\",\"in-use\",180,2000,500,\"2018-08-08T22:00:00Z\",\"i-086f957ee\",
\"Rightsize\",
\"Rightsize\",
\"Rightsize\""
So not entirely sure how to deal with the array inside the "recommendations" section in jq, I think it might be called unflattening?
You can try this:
jq '.result[] | [ flatten[] | try(.action) // . ] | #csv' file
"\"ebsvolume\",\"gtest\",\"vol-999999999999\",\"g-test-acct\",\"12345678912\",\"ap-southeast-2c\",\"ap-southeast-2\",998.56,167.7,0,\"2018-08-16T22:00:00Z\",\"io1\",\"in-use\",180,2000,500,\"2018-08-08T22:00:00Z\",\"i-086f957ee\",\"Rightsize\",\"Rightsize\",\"Rightsize\""
flatten does what it says.
try tests if .action is neither null nor false. If so, it emits its value, otherwise jq emits the other value (operator //).
The filtered values are put into an array in order to get them converted with the #csv operator.
That didn't overly work for me actually it omitted all the data in the previous array - but thanks!
I ended up with the following, granted it doesn't put the Rightsize details on a seperate line but it will have to do:
jq -r '.result[] | [.service,.name,.resourceIdentifier,.accountName,.vendorAccountId,.availabilityZone,.region,.effectiveHourly,.totalSpend,.idle,.lastSeen,.volumeType,.state,.volumeSize,.iops,.throughput,.lastAttachedTime,.lastAttachedId,.recommendations[][]] |#csv' ./myfile.json

angular can't parse JSON [duplicate]

I have a string in the following strings
json_string = '{u"favorited": false, u"contributors": null}'
json_string1 = '{"favorited": false, "contributors": null}'
The following json load works fine.
json.loads(json_string1 )
But, the following json load give me value error, how to fix this?
json.loads(json_string)
ValueError: Expecting property name: line 1 column 2 (char 1)
I faced the same problem with strings I received from a customer. The strings arrived with u's. I found a workaround using the ast package:
import ast
import json
my_str='{u"favorited": false, u"contributors": null}'
my_str=my_str.replace('"',"'")
my_str=my_str.replace(': false',': False')
my_str=my_str.replace(': null',': None')
my_str = ast.literal_eval(my_str)
my_dumps=json.dumps(my_str)
my_json=json.loads(my_dumps)
Note the replacement of "false" and "null" by "False" and "None", since the literal_eval only recognizes specific types of Python literal structures. This means that if you may need more replacements in your code - depending on the strings you receive.
You could remove the u suffix from the string using REGEX and then load the JSON
s = '{u"favorited": false, u"contributors": null}'
json_string = re.sub('(\W)\s*u"',r'\1"', s)
json.loads(json_string )
Use json.dumps to convert a Python dictionary to a string, not str. Then you can expect json.loads to work:
Incorrect:
>>> D = {u"favorited": False, u"contributors": None}
>>> s = str(D)
>>> s
"{u'favorited': False, u'contributors': None}"
>>> json.loads(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\dev\Python27\lib\json\__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "D:\dev\Python27\lib\json\decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "D:\dev\Python27\lib\json\decoder.py", line 380, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting property name: line 1 column 2 (char 1)
Correct:
>>> D = {u"favorited": False, u"contributors": None}
>>> s = json.dumps(D)
>>> s
'{"favorited": false, "contributors": null}'
>>> json.loads(s)
{u'favorited': False, u'contributors': None}

In Google App Engine, how to check input validity of Key created by urlsafe?

Suppose I create a key from user input websafe url
key = ndb.Key(urlsafe=some_user_input)
How can I check if the some_user_input is valid?
My current experiment shows that statement above will throw ProtocolBufferDecodeError (Unable to merge from string.) exception if the some_user_input is invalid, but could not find anything about this from the API. Could someone kindly confirm this, and point me some better way for user input validity checking instead of catching the exception?
Thanks a lot!
If you try to construct a Key with an invalid urlsafe parameter
key = ndb.Key(urlsafe='bogus123')
you will get an error like
Traceback (most recent call last):
File "/opt/google/google_appengine/google/appengine/runtime/wsgi.py", line 240, in Handle
handler = _config_handle.add_wsgi_middleware(self._LoadHandler())
File "/opt/google/google_appengine/google/appengine/runtime/wsgi.py", line 299, in _LoadHandler
handler, path, err = LoadObject(self._handler)
File "/opt/google/google_appengine/google/appengine/runtime/wsgi.py", line 85, in LoadObject
obj = __import__(path[0])
File "/home/tim/git/project/main.py", line 10, in <module>
from src.tim import handlers as handlers_
File "/home/tim/git/project/src/tim/handlers.py", line 42, in <module>
class ResetHandler(BaseHandler):
File "/home/tim/git/project/src/tim/handlers.py", line 47, in ResetHandler
key = ndb.Key(urlsafe='bogus123')
File "/opt/google/google_appengine/google/appengine/ext/ndb/key.py", line 212, in __new__
self.__reference = _ConstructReference(cls, **kwargs)
File "/opt/google/google_appengine/google/appengine/ext/ndb/utils.py", line 142, in positional_wrapper
return wrapped(*args, **kwds)
File "/opt/google/google_appengine/google/appengine/ext/ndb/key.py", line 642, in _ConstructReference
reference = _ReferenceFromSerialized(serialized)
File "/opt/google/google_appengine/google/appengine/ext/ndb/key.py", line 773, in _ReferenceFromSerialized
return entity_pb.Reference(serialized)
File "/opt/google/google_appengine/google/appengine/datastore/entity_pb.py", line 1710, in __init__
if contents is not None: self.MergeFromString(contents)
File "/opt/google/google_appengine/google/net/proto/ProtocolBuffer.py", line 152, in MergeFromString
self.MergePartialFromString(s)
File "/opt/google/google_appengine/google/net/proto/ProtocolBuffer.py", line 168, in MergePartialFromString
self.TryMerge(d)
File "/opt/google/google_appengine/google/appengine/datastore/entity_pb.py", line 1839, in TryMerge
d.skipData(tt)
File "/opt/google/google_appengine/google/net/proto/ProtocolBuffer.py", line 677, in skipData
raise ProtocolBufferDecodeError, "corrupted"
ProtocolBufferDecodeError: corrupted
Interesting here are is
File "/opt/google/google_appengine/google/appengine/ext/ndb/key.py", line 773, in _ReferenceFromSerialized
return entity_pb.Reference(serialized)
which is the last code executed in the key.py module:
def _ReferenceFromSerialized(serialized):
"""Construct a Reference from a serialized Reference."""
if not isinstance(serialized, basestring):
raise TypeError('serialized must be a string; received %r' % serialized)
elif isinstance(serialized, unicode):
serialized = serialized.encode('utf8')
return entity_pb.Reference(serialized)
serialized here being the decoded urlsafe string, you can read more about it in the link to the source code.
another interesting one is the last one:
File "/opt/google/google_appengine/google/appengine/datastore/entity_pb.py", line 1839, in TryMerge
in the entity_pb.py module which looks like this
def TryMerge(self, d):
while d.avail() > 0:
tt = d.getVarInt32()
if tt == 106:
self.set_app(d.getPrefixedString())
continue
if tt == 114:
length = d.getVarInt32()
tmp = ProtocolBuffer.Decoder(d.buffer(), d.pos(), d.pos() + length)
d.skip(length)
self.mutable_path().TryMerge(tmp)
continue
if tt == 162:
self.set_name_space(d.getPrefixedString())
continue
if (tt == 0): raise ProtocolBuffer.ProtocolBufferDecodeError
d.skipData(tt)
which is where the actual attempt to 'merge the input to into a Key' is made.
You can see in the source code that during the process of constructing a Key from an urlsafe parameter not a whole lot can go wrong. First it checks if the input is a string and if it's not, a TypeError is raised, if it is but it's not 'valid', indeed a ProtocolBufferDecodeError is raised.
My current experiment shows that statement above will throw ProtocolBufferDecodeError (Unable to merge from string.) exception if the some_user_input is invalid, but could not find anything about this from the API. Could someone kindly confirm this
Sort of confirmed - we now know that also TypeError can be raised.
and point me some better way for user input validity checking instead of catching the exception?
This is an excellent way to check validity! Why do the checks yourself if the they are already done by appengine? A code snippet could look like this (not working code, just an example)
def get(self):
# first, fetch the user_input from somewhere
try:
key = ndb.Key(urlsafe=user_input)
except TypeError:
return 'Sorry, only string is allowed as urlsafe input'
except ProtocolBufferDecodeError:
return 'Sorry, the urlsafe string seems to be invalid'

Resources