I have a string in the following strings
json_string = '{u"favorited": false, u"contributors": null}'
json_string1 = '{"favorited": false, "contributors": null}'
The following json load works fine.
json.loads(json_string1 )
But, the following json load give me value error, how to fix this?
json.loads(json_string)
ValueError: Expecting property name: line 1 column 2 (char 1)
I faced the same problem with strings I received from a customer. The strings arrived with u's. I found a workaround using the ast package:
import ast
import json
my_str='{u"favorited": false, u"contributors": null}'
my_str=my_str.replace('"',"'")
my_str=my_str.replace(': false',': False')
my_str=my_str.replace(': null',': None')
my_str = ast.literal_eval(my_str)
my_dumps=json.dumps(my_str)
my_json=json.loads(my_dumps)
Note the replacement of "false" and "null" by "False" and "None", since the literal_eval only recognizes specific types of Python literal structures. This means that if you may need more replacements in your code - depending on the strings you receive.
You could remove the u suffix from the string using REGEX and then load the JSON
s = '{u"favorited": false, u"contributors": null}'
json_string = re.sub('(\W)\s*u"',r'\1"', s)
json.loads(json_string )
Use json.dumps to convert a Python dictionary to a string, not str. Then you can expect json.loads to work:
Incorrect:
>>> D = {u"favorited": False, u"contributors": None}
>>> s = str(D)
>>> s
"{u'favorited': False, u'contributors': None}"
>>> json.loads(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\dev\Python27\lib\json\__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "D:\dev\Python27\lib\json\decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "D:\dev\Python27\lib\json\decoder.py", line 380, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting property name: line 1 column 2 (char 1)
Correct:
>>> D = {u"favorited": False, u"contributors": None}
>>> s = json.dumps(D)
>>> s
'{"favorited": false, "contributors": null}'
>>> json.loads(s)
{u'favorited': False, u'contributors': None}
Related
I am simply trying to understand how to format a config file to allow multiple GPUs/distributed training to take place via the "train" command.
The only clear tutorial out there is seemingly for much older versions of AllenNLP: Tutorial: How to train with multiple GPUs in AllenNLP and does not work as the "distributed" argument is now a Bool and will not accept a list of CUDA device IDs.
"trainer": {
// Set use_amp to true to use automatic mixed-precision during training (if your GPU supports it)
"use_amp": true,
"cuda_devices": [7,8],
"optimizer": {
"type": "huggingface_adamw",
"lr": 5e-5,
"eps": 1e-06,
"correct_bias": false,
"weight_decay": 0.1,
"parameter_groups": [
// Apply weight decay to pre-trained params, excluding LayerNorm params and biases
[["bias", "LayerNorm\\.weight", "layer_norm\\.weight"], {"weight_decay": 0}],
],
},
"callbacks":[{"type":'tensorboard'}],
"num_epochs": 10,
"checkpointer": {
// A value of null or -1 will save the weights of the model at the end of every epoch
"keep_most_recent_by_count": 2,
},
"grad_norm": 1.0,
"learning_rate_scheduler": {
"type": "slanted_triangular",
},
"distributed": {"cuda_devices": [7,8],},
"world_size": 2,
},
}
Leads to:
kwargs = create_kwargs(constructor_to_inspect, cls, params, **extras)
File "/home/niallt/DeCLUTR/allennlp/allennlp/common/from_params.py", line 206, in create_kwargs
constructed_arg = pop_and_construct_arg(
File "/home/niallt/DeCLUTR/allennlp/allennlp/common/from_params.py", line 314, in pop_and_construct_arg
return construct_arg(class_name, name, popped_params, annotation, default, **extras)
File "/home/niallt/DeCLUTR/allennlp/allennlp/common/from_params.py", line 363, in construct_arg
raise TypeError(f"Expected {argument_name} to be a {annotation.__name__}.")
TypeError: Expected distributed to be a bool.
Then trying to shift towards allennlp v2.10 by setting distributed to bool and providing the cuda_devices as a list leads to the following:
"trainer": {
// Set use_amp to true to use automatic mixed-precision during training (if your GPU supports it)
"use_amp": true,
"cuda_devices": [7,8],
"optimizer": {
"type": "huggingface_adamw",
"lr": 5e-5,
"eps": 1e-06,
"correct_bias": false,
"weight_decay": 0.1,
"parameter_groups": [
// Apply weight decay to pre-trained params, excluding LayerNorm params and biases
[["bias", "LayerNorm\\.weight", "layer_norm\\.weight"], {"weight_decay": 0}],
],
},
"callbacks":[{"type":'tensorboard'}],
"num_epochs": 10,
"checkpointer": {
// A value of null or -1 will save the weights of the model at the end of every epoch
"keep_most_recent_by_count": 2,
},
"grad_norm": 1.0,
"learning_rate_scheduler": {
"type": "slanted_triangular",
},
"distributed": true,
"world_size": 2
},
}
With the following error:
File "/home/niallt/DeCLUTR/allennlp/allennlp/commands/train.py", line 786, in from_partial_objects
trainer_ = trainer.construct(
File "/home/niallt/DeCLUTR/allennlp/allennlp/common/lazy.py", line 82, in construct
return self.constructor(**contructor_kwargs)
File "/home/niallt/DeCLUTR/allennlp/allennlp/common/lazy.py", line 66, in constructor_to_use
return self._constructor.from_params( # type: ignore[union-attr]
File "/home/niallt/DeCLUTR/allennlp/allennlp/common/from_params.py", line 604, in from_params
return retyped_subclass.from_params(
File "/home/niallt/DeCLUTR/allennlp/allennlp/common/from_params.py", line 638, in from_params
return constructor_to_call(**kwargs) # type: ignore
File "/home/niallt/DeCLUTR/allennlp/allennlp/training/gradient_descent_trainer.py", line 1154, in from_partial_objects
ddp_accelerator = TorchDdpAccelerator(cuda_device=cuda_device)
File "/home/niallt/DeCLUTR/allennlp/allennlp/nn/parallel/ddp_accelerator.py", line 138, in __init__
super().__init__(local_rank=local_rank, world_size=world_size, cuda_device=cuda_device)
File "/home/niallt/DeCLUTR/allennlp/allennlp/nn/parallel/ddp_accelerator.py", line 102, in __init__
self.local_rank: int = local_rank if local_rank is not None else dist.get_rank()
File "/home/niallt/venvs/39_declutr/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 844, in get_rank
default_pg = _get_default_group()
File "/home/niallt/venvs/39_declutr/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 429, in _get_default_group
raise RuntimeError(
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
I'm guessing I may just be missing some key arguments here - but struggling to determine what.
Any help would be much appreciated
hi all ijson newbie I have a very large .json file 168 (GB) I want to get all possible keys, but in the file some values are written as NaN. ijson creates a generator and outputs dictionaries, in My code value. When a specific item is returned, it throws an error. How can you get a string instead of a dictionary instead of value? Tried **parser = ijson.items (input_file, '', multiple_values = True, map_type = str) **, didn't help.
def parse_json(json_filename):
with open('max_data_error.txt', 'w') as outfile:
with open(json_filename, 'r') as input_file:'''
# outfile.write('[ '
parser = ijson.items(input_file, '', multiple_values=True)
cont = 0
max_keys_list = list()
for value in parser:
for i in json.loads(json.dumps(value, ensure_ascii=False, default=str)) :
if i not in max_keys_list:
max_keys_list.append(i)
print(value)
print(max_keys_list)
for keys_item in max_keys_list:
outfile.write(keys_item + '\n')
if __name__ == '__main__':
parse_json('./email/emailrecords.bson.json')
Traceback (most recent call last):
File "panda read.py", line 29, in <module>
parse_json('./email/emailrecords.bson.json')
File "panda read.py", line 17, in parse_json
for value in parser:
ijson.common.IncompleteJSONError: lexical error: invalid char in json text.
litecashwire.com","lastname":NaN,"firstname":"Mia","zip":"87
(right here) ------^
Your file I not valid JSON (NaN is not a valid JSON value); therefore any JSON parsing library will complain about this, one way or another, unless they have an extension to handle this non-standard content.
The ijson FAQ found in the project description has a question about invalid UTF-8 characters and how to deal with them. Those same answers apply here, so I would suggest you go and try one of those.
Others have reported a similar error, but the solutions given do not solve my problem.
For example there is a good answer here. The answer in the link mentions how ndb changes from a first use to a later use and suggests there is a problem because a first run produces a None in the Datastore. I cannot reproduce or see that happening in the Datastore for my sdk, but that may be because I am running it here from the interactive console.
I am pretty sure I got an initial good run with the GAE interactive console, but every run since then has failed with the error in my Title to this question.
I have left the print statements in the following code because they show good results and assure me that the error is occuring in the put() at the very end.
from google.appengine.ext import ndb
class Account(ndb.Model):
week = ndb.IntegerProperty(repeated=True)
weeksNS = ndb.IntegerProperty(repeated=True)
weeksEW = ndb.IntegerProperty(repeated=True)
terry=Account(week=[],weeksNS=[],weeksEW=[])
terry_key=terry.put()
terry = terry_key.get()
print terry
for t in list(range(4)): #just dummy input, but like real input
terry.week.append(t)
print terry.week
region = 1 #same error message for region = 0
if region :
terry.weeksEW.append(terry.week)
else:
terry.weeksNS.append(terry.week)
print 'EW'+str(terry.weeksEW)
print 'NS'+str(terry.weeksNS)
terry.week = []
print 'week'+str(terry.week)
terry.put()
The idea of my code is to first build up the terry.week list values incrementally and then later store the whole list to the appropriate region, either NS or EW. So I'm looking for a workaround for this scheme.
The error message is likely of no value but I am reproducing it here.
Traceback (most recent call last):
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/tools/devappserver2/python/runtime/request_handler.py", line 237, in handle_interactive_request
exec(compiled_code, self._command_globals)
File "<string>", line 55, in <module>
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 3458, in _put
return self._put_async(**ctx_options).get_result()
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/tasklets.py", line 383, in get_result
self.check_success()
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/tasklets.py", line 427, in _help_tasklet_along
value = gen.throw(exc.__class__, exc, tb)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/context.py", line 824, in put
key = yield self._put_batcher.add(entity, options)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/tasklets.py", line 430, in _help_tasklet_along
value = gen.send(val)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/context.py", line 358, in _put_tasklet
keys = yield self._conn.async_put(options, datastore_entities)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/datastore/datastore_rpc.py", line 1858, in async_put
pbs = [entity_to_pb(entity) for entity in entities]
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 697, in entity_to_pb
pb = ent._to_pb()
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 3167, in _to_pb
prop._serialize(self, pb, projection=self._projection)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 1422, in _serialize
values = self._get_base_value_unwrapped_as_list(entity)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 1192, in _get_base_value_unwrapped_as_list
wrapped = self._get_base_value(entity)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 1180, in _get_base_value
return self._apply_to_values(entity, self._opt_call_to_base_type)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 1352, in _apply_to_values
value[:] = map(function, value)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 1234, in _opt_call_to_base_type
value = _BaseValue(self._call_to_base_type(value))
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 1255, in _call_to_base_type
return call(value)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 1331, in call
newvalue = method(self, value)
File "/Users/brian/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/model.py", line 1781, in _validate
(value,))
BadValueError: Expected integer, got [0, 1, 2, 3]
I believe the error comes from these lines:
terry.weeksEW.append(terry.week)
terry.weeksNS.append(terry.week)
You are not appending another integer; You are appending a list, when an integer is expected.
>>> aaa = [1,2,3]
>>> bbb = [4,5,6]
>>> aaa.append(bbb)
>>> aaa
[1, 2, 3, [4, 5, 6]]
>>>
This fails the ndb.IntegerProperty test.
Try:
terry.weeksEW += terry.week
terry.weeksNS += terry.week
EDIT: To save a list of lists, do not use the IntegerProperty(), but instead the JsonProperty(). Better still, the ndb datastore is deprecated, so... I recommend Firestore, which uses JSON objects by default. At least use Cloud Datastore, or Cloud NDB.
I'm adapting a script.py to achieve transfer learning. I find many script to retrain a model by TFRecord files, but none of them worked for me bacause of something about TF2.0 and contrib, so I'm trying to convert a script to adapt to TF2 and to my model.
This is my script at the moment:
from __future__ import absolute_import, division, print_function, unicode_literals
import os
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
keras = tf.keras
EPOCHS = 1
# Data preprocessing
import pathlib
#data_dir = tf.keras.utils.get_file(origin="/home/pi/venv/raccoon_dataset/", fname="raccoons_dataset")
#data_dir = pathlib.Path(data_dir)
data_dir = "/home/pi/.keras/datasets/ssd_mobilenet_v1_coco_2018_01_28/saved_model/saved_model.pb"
######################
# Read the TFRecords #
######################
def imgs_input_fn(filenames, perform_shuffle=False, repeat_count=1, batch_size=1):
def _parse_function(serialized):
features = \
{
'image': tf.io.FixedLenFeature([], tf.string),
'label': tf.io.FixedLenFeature([], tf.int64)
}
# Parse the serialized data so we get a dict with our data.
parsed_example = tf.io.parse_single_example(serialized=serialized,
features=features)
print("\nParsed example:\n", parsed_example, "\nEnd of parsed example:\n")
# Get the image as raw bytes.
image_shape = tf.stack([300, 300, 3])
image_raw = parsed_example['image']
label = tf.cast(parsed_example['label'], tf.float32)
# Decode the raw bytes so it becomes a tensor with type.
image = tf.io.decode_raw(image_raw, tf.uint8)
image = tf.cast(image, tf.float32)
image = tf.reshape(image, image_shape)
#image = tf.subtract(image, 116.779) # Zero-center by mean pixel
#image = tf.reverse(image, axis=[2]) # 'RGB'->'BGR'
d = dict(zip(["image"], [image])), [label]
return d
dataset = tf.data.TFRecordDataset(filenames=filenames)
# Parse the serialized data in the TFRecords files.
# This returns TensorFlow tensors for the image and labels.
#print("\nDataset before parsing:\n",dataset,"\n")
dataset = dataset.map(_parse_function)
#print("\nDataset after parsing:\n",dataset,"\n")
if perform_shuffle:
# Randomizes input using a window of 256 elements (read into memory)
dataset = dataset.shuffle(buffer_size=256)
dataset = dataset.repeat(repeat_count) # Repeats dataset this # times
dataset = dataset.batch(batch_size) # Batch size to use
print("\nDataset batched:\n", dataset, "\nEnd dataset\n")
iterator = tf.compat.v1.data.make_one_shot_iterator(dataset)
print("\nIterator shape:\n", tf.compat.v1.data.get_output_shapes(iterator),"\nEnd\n")
#print("\nIterator:\n",iterator.get_next(),"\nEnd Iterator\n")
batch_features, batch_labels = iterator.get_next()
return batch_features, batch_labels
raw_train = tf.compat.v1.estimator.TrainSpec(input_fn=imgs_input_fn(
"/home/pi/venv/raccoon_dataset/data/train.record",
perform_shuffle=True,
repeat_count=5,
batch_size=20),
max_steps=1)
and this is the resulting screen:
Parsed example:
{'image': <tf.Tensor 'ParseSingleExample/ParseSingleExample:0' shape=() dtype=string>, 'label': <tf.Tensor 'ParseSingleExample/ParseSingleExample:1' shape=() dtype=int64>}
End of parsed example:
Dataset batched:
<BatchDataset shapes: ({image: (None, 300, 300, 3)}, (None, 1)), types: ({image: tf.float32}, tf.float32)>
End dataset
Iterator shape:
({'image': TensorShape([None, 300, 300, 3])}, TensorShape([None, 1]))
End
2019-11-20 14:01:14.493817: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Feature: image (data type: string) is required but could not be found.
2019-11-20 14:01:14.495019: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at iterator_ops.cc:929 : Invalid argument: {{function_node __inference_Dataset_map__parse_function_27}} Feature: image (data type: string) is required but could not be found.
[[{{node ParseSingleExample/ParseSingleExample}}]]
Traceback (most recent call last):
File "transfer_learning.py", line 127, in <module>
batch_size=20),
File "transfer_learning.py", line 107, in imgs_input_fn
batch_features, batch_labels = iterator.get_next()
File "/home/pi/venv/lib/python3.7/site-packages/tensorflow_core/python/data/ops/iterator_ops.py", line 737, in get_next
return self._next_internal()
File "/home/pi/venv/lib/python3.7/site-packages/tensorflow_core/python/data/ops/iterator_ops.py", line 651, in _next_internal
output_shapes=self._flat_output_shapes)
File "/home/pi/venv/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_dataset_ops.py", line 2673, in iterator_get_next_sync
_six.raise_from(_core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __inference_Dataset_map__parse_function_27}} Feature: image (data type: string) is required but could not be found.
[[{{node ParseSingleExample/ParseSingleExample}}]] [Op:IteratorGetNextSync]
I don't know what I'm doing wrong.
I am developing a web application that takes a word file and performs tokenization.
I noticed that the document is passed correctly from angularJS to Flask, but there is an error that I can't give an explanation:
Traceback (most recent call last):
File "C:\Users\AOUP\MiniAnaconda\lib\site-packages\flask\app.py", line 2446, in wsgi_app
response = self.full_dispatch_request()
File "C:\Users\AOUP\MiniAnaconda\lib\site-packages\flask\app.py", line 1951, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Users\AOUP\MiniAnaconda\lib\site-packages\flask\app.py", line 1820, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "C:\Users\AOUP\MiniAnaconda\lib\site-packages\flask\_compat.py", line 39, in reraise
raise value
File "C:\Users\AOUP\MiniAnaconda\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\AOUP\MiniAnaconda\lib\site-packages\flask\app.py", line 1935, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "app.py", line 43, in tokenizer
myDoc = word.Documents.Open(pathToProc, False, False, True) #stackoverflow
File "C:\Users\AOUP\MiniAnaconda\lib\site-packages\win32com\client\dynamic.py", line 527, in __getattr__
raise AttributeError("%s.%s" % (self._username_, attr))
AttributeError: Word.Application.Documents
The document is passed by angularJS with the following code:
var f = document.getElementsByTagName("form")[0].children[1].files[0].name;
if (f != ""){
$http({
url: '/tokenizeDoc',
method: "GET",
params: {doc : f}
});
}
Subsequently it is read by Flask with the following script, and the error falls in the line with the error comment:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import string
import win32com.client
import nltk
import os
from collections import Counter
from pywintypes import com_error
from flask import request, Flask, render_template, jsonify
word = win32com.client.Dispatch("Word.Application")
word.Visible = False
app = Flask(__name__)
#app.route('/')
def landingPage():
return render_template('homepage.html')
#app.route('/tokenizeDoc', methods = ['GET'])
def tokenizer():
if request.method == 'GET':
pathToProc = request.values.get("doc")
sent_tokenizer = nltk.data.load('tokenizers/punkt/italian.pickle')
it_stop_words = nltk.corpus.stopwords.words('italian') + ['\n', '\t', '']
trashes = it_stop_words + list(string.punctuation)
tokensTOT = []
try:
myDoc = word.Documents.Open(pathToProc, False, False, True) #ERROR!!!
sentences = sent_tokenizer.tokenize(word.ActiveDocument.Range().Text)
myDoc.Close()
del myDoc
for sentence in sentences:
tokensTOT = tokensTOT + [t.lower() for t in nltk.word_tokenize(sentence)
if t.lower() not in trashes]
except com_error:
print('IMPOSSIBILE DECIFRARE IL FILE')
return ''
I hope the win32com library is not incompatible with web frameworks and someone can give me an answer.
Many thanks in advance.
use os.path.abspath(pathToProc) instead of pathToProc myDoc = word.Documents.Open(pathToProc, False, False, True) #ERROR!!!
I faced the same problem.
The only solution that I managed to apply is to execute win32com.client.Dispatch individually for each call of the view function.
You also need to execute pythoncom.CoInitialize() for normal working flask multi-threading
import pythoncom
import win32com.client
def get_word_instance():
pythoncom.CoInitialize()
return win32com.client.Dispatch("Word.Application")
#app.route('/tokenizeDoc', methods=['GET'])
def tokenizer():
word = get_word_instance()
word.visible = False
The creation instance of COM-object can be a resource-intensive process. If you need more performance when working with COM objects, then you may need to consider the option of disabling the multithreading of the Flask application.
app.run(threaded=False)
Then you can use your code without any changes