GAE Full Text Search development console UnicodeEncodeError - google-app-engine

I have an index with manny words with accent (e.g: São Paulo, José, etc).
The search api works fine, but when try to do some test queries on development console, I can't access index data.
This error only occurs on development environment. On production GAE everything works fine.
Bellow the traceback:
Traceback (most recent call last):
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/webapp/_webapp25.py", line 701, in __call__
handler.get(*groups)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/admin/__init__.py", line 1704, in get
'values': self._ProcessSearchResponse(resp),
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/admin/__init__.py", line 1664, in _ProcessSearchResponse
value = TruncateValue(doc.fields[field_name].value)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/admin/__init__.py", line 158, in TruncateValue
value = str(value)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc1' in position 5: ordinal not in range(128)

Related

google app engine NDB records counts from NDB model

How many records we can get from google app engine from single query so that we can display count to user and is we can increase timeout limit 3 seconds to 5 seconds
In my experience, ndb cannot pull more than 1000 records at a time. Here is an example of what happens if I try to use .count() on a table that contains ~500,000 records.
s~project-id> models.Transaction.query().count()
WARNING:root:suspended generator _count_async(query.py:1330) raised AssertionError()
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/utils.py", line 160, in positional_wrapper
return wrapped(*args, **kwds)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/query.py", line 1287, in count
return self.count_async(limit, **q_options).get_result()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/tasklets.py", line 383, in get_result
self.check_success()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/tasklets.py", line 427, in _help_tasklet_along
value = gen.throw(exc.__class__, exc, tb)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/query.py", line 1330, in _count_async
batch = yield rpc
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/tasklets.py", line 513, in _on_rpc_completion
result = rpc.get_result()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/api/apiproxy_stub_map.py", line 614, in get_result
return self.__get_result_hook(self)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/datastore/datastore_query.py", line 2910, in __query_result_hook
self._batch_shared.conn.check_rpc_success(rpc)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/datastore/datastore_rpc.py", line 1377, in check_rpc_success
rpc.check_success()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/api/apiproxy_stub_map.py", line 580, in check_success
self.__rpc.CheckSuccess()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/api/apiproxy_rpc.py", line 157, in _WaitImpl
self.request, self.response)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/remote_api/remote_api_stub.py", line 308, in MakeSyncCall
handler(request, response)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/remote_api/remote_api_stub.py", line 362, in _Dynamic_Next
assert next_request.offset() == 0
AssertionError
To by pass this, you can do something like:
objs = []
q = None
more = True
while more:
_objs, q, more = models.Transaction.query().fetch_page(300, start_cursor=q)
objs.extend(_objs)
But even that will eventually hit memory/timeout limits.
Currently I use Google Dataflow to pre-compute these values and store the results in Datastore as the models DaySummaries & StatsPerUser
EDIT:
snakecharmerb is correct. I was able to use .count() in the production environment, but the more entities it has to count, the longer it seems to take. Here's a screenshot of my logs viewer where it took ~15 seconds to count ~330,000 records
When I tried adding a filter to that query which returned a count of ~4500, it took about a second to run instead.
EDIT #2:
Ok I had another app engine project with a kind with ~8,000,000 records. I tried to do .count() on that in my http request handler and the request timed-out after running for 60 seconds.

BadValueError: Entity has uninitialized properties ___ after resetting indexes and clearing memcache

I'm working on a Google App Engine application in python. I tried to switch a query I was running for one of my routes to query on only one property instead of 2, which caused an indexing error to appear whenever we tried running that query.
It was something along the lines of "No index matching the specified parameters could be found", but I don't have any screen shots at the moment. In order to try and rectify the situation, we ran appcfg.py vacuum_indices and deleted all indices related to the original search. We then uploaded a new index.yaml specifying the new index. Though we were able to see that the new indexes had indeed been created on the admin panel, and the old ones were gone, we were still getting the same error.
We're really unsure why this is happening, and are having trouble finding documentation online for these issues. Our next thought was that some previous state in memcache was causing the query to attempt to use it's old index. So we flushed memcache, and now we're getting this error:
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1535, in __call__
rv = self.handle_exception(request, response, e)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1529, in __call__
rv = self.router.dispatch(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1278, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1102, in __call__
return handler.dispatch()
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 572, in dispatch
return self.handle_exception(e, self.app.debug)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~dev-erpetcloud2/dev1.392600188150722624/routes/users.py", line 172, in post
res_dict = cp_user.to_dict()
File "/base/data/home/apps/s~dev-erpetcloud2/dev1.392600188150722624/routes/models/../models/cp_models.py", line 248, in to_dict
animal_dict = animal.to_dict()
File "/base/data/home/apps/s~dev-erpetcloud2/dev1.392600188150722624/routes/models/../models/cp_models.py", line 574, in to_dict
protocol, params = self.get_protocol_and_params()
File "/base/data/home/apps/s~dev-erpetcloud2/dev1.392600188150722624/routes/models/../models/cp_models.py", line 395, in get_protocol_and_params
record = self.protocol_state_key.get()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/key.py", line 572, in get
return self.get_async(**ctx_options).get_result()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 342, in get_result
self.check_success()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 389, in _help_tasklet_along
value = gen.send(val)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/context.py", line 765, in get
pbs = entity._to_pb(set_key=False).SerializePartialToString()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 3158, in _to_pb
self._check_initialized()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 3014, in _check_initialized
'Entity has uninitialized properties: %s' % ', '.join(baddies))
BadValueError: Entity has uninitialized properties: title
Looking through the datastore, the entity that this trace references definitely does have a 'title' property.
I've looked around a lot for errors that can arise from deleting indices and flushing memcache, and nothing useful has come up.
If someone could perhaps give me a bit of insight into what could be happening here and how these systems work (my mental model might be off), or point me in the right direction, that would be fantastic. Thanks!!
This error signifies that the property 'title' has been specified as a required property but you are trying to write an entity to datastore without initializing this property. This error occurs only at the time of put(). By any chance, did you make any changes in the entity definition, or a part of code which writes these entities to datastore ?
Edit: The error can also happen while trying to read an entity which has no value specified for a 'required' property.

Trying to upload compressed data (unicode) via bulkuploader

I ran into an issue where the data being uploaded to db.text was over 1 mb, so I compressed the information using zlib. Bulkloader by default didn't support the unicode data data being uploaded, so I switched out the source code to use unicodecsv rather than python's built in csv module. The problem that I'm running into is that Google App Engine's bulkload is unable to support the unicode characters (even though the db.Text entity is unicode).
[ERROR ] [Thread-12] DataSourceThread:
Traceback (most recent call last):
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/bulkloader.py", line 1611, in run
self.PerformWork()
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/bulkloader.py", line 1730, in PerformWork
for item in content_gen.Batches():
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/bulkloader.py", line 542, in Batches
self._ReadRows(key_start, key_end)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/bulkloader.py", line 452, in _ReadRows
row = self.reader.next()
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/bulkload/csv_connector.py", line 219, in generate_import_record
for input_dict in self.dict_generator:
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/unicodecsv/__init__.py", line 188, in next
row = csv.DictReader.next(self)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 108, in next
row = self.reader.next()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/unicodecsv/__init__.py", line 106, in next
row = self.reader.next()
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/bulkload/csv_connector.py", line 55, in utf8_recoder
for line in codecs.getreader(encoding)(stream):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 612, in next
line = self.readline()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 527, in readline
data = self.read(readsize, firstline=True)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 474, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c in position 29: invalid start byte
I know that for my local testing I could modify the python files to use unicodecsv's module instead but that doesn't help solve the problem for using GAE's Datastore on production. Is there an existing solution to this problem that anyone is aware of?
Solved this the other week, you just need to base64 encode the results so you won't have any issues with bulkloader size increases by 30-50% but since zlib already compressed my data to 10% of the original this wasn't too bad.

Google AppEngine backups reporting ApiTemporaryUnavailableError

When trying to backup the datastore from the DataStore admin page, backups fail with and error for both blobstore and cloud store targets:
Callstack for cloud store:
ApplicationError: 1
Traceback (most recent call last):
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/datastore_admin/backup_handler.py", line 642, in _ProcessPostRequest
10)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/datastore_admin/backup_handler.py", line 492, in _perform_backup
gs_bucket_name = validate_and_canonicalize_gs_bucket(gs_bucket_name)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/datastore_admin/backup_handler.py", line 1803, in validate_and_canonicalize_gs_bucket
verify_bucket_writable(bucket_name)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/datastore_admin/backup_handler.py", line 1763, in verify_bucket_writable
test_file = files.open(files.gs.create(file_name), 'a', exclusive_lock=True)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/api/files/gs.py", line 331, in create
return files._create(_GS_FILESYSTEM, filename=filename, params=params)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/api/files/file.py", line 650, in _create
_make_call('Create', request, response)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/api/files/file.py", line 255, in _make_call
_raise_app_error(e)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/api/files/file.py", line 183, in _raise_app_error
raise ApiTemporaryUnavailableError(e)
ApiTemporaryUnavailableError: ApplicationError: 1
Callstack for blob store:
ApplicationError: 1
Traceback (most recent call last):
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/webapp/_webapp25.py", line 716, in call
handler.post(*groups)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/mapreduce/base_handler.py", line 147, in post
self.handle()
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/mapreduce/handlers.py", line 1391, in handle
state)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/mapreduce/handlers.py", line 1539, in _schedule_shards
mr_state.writer_state)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/mapreduce/output_writers.py", line 726, in create
acl=acl)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/mapreduce/output_writers.py", line 640, in _create_file
return files.blobstore.create(mime_type, filename)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/api/files/blobstore.py", line 75, in create
return files._create(_BLOBSTORE_FILESYSTEM, params=params)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/api/files/file.py", line 650, in _create
_make_call('Create', request, response)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/api/files/file.py", line 255, in _make_call
_raise_app_error(e)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/api/files/file.py", line 183, in _raise_app_error
raise ApiTemporaryUnavailableError(e)
ApiTemporaryUnavailableError: ApplicationError: 1
Seems to be a problem with the underlying files API which is out of our control. Anyone come across this and have a solution or workaround?
I think this may have to do with the fact that the files API is deprecated.
I had the same error trying to push a mapreduce pipeline to the blobstore (as George-Bogdan is having). I decided to finally do the switch I have been avoiding (switching to the GCS client library). Once I finished the switch, my tests have been conclusive, this works properly.
It seems like it's a transient issue on the Google side. But since the Files API is deprecated, I feel safer using the library they actually suggest using now
(answer copied from my own answer on another question)

Google App Engine Application Error 5

I frequently get this Application error. What does this mean ?
File "/base/data/home/apps/0xxopdp/10.347467753731922836/matrices.py", line 215, in insert_into_db
obj.put()
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/db/__init__.py", line 895, in put
return datastore.Put(self._entity, config=config)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 404, in Put
return _GetConnection().async_put(config, entities, extra_hook).get_result()
File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 601, in get_result
self.check_success()
File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 572, in check_success
rpc.check_success()
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 502, in check_success
self.__rpc.CheckSuccess()
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 126, in CheckSuccess
raise self.exception
ApplicationError: ApplicationError: 5
I do make many calls to the datastore. What caused this problem ?
The ApplicationError:5 message tipically indicates a Timeout error.
The error is raised by the datastore API, so your application is probably trying to make more than the allowed 5 writes per seconds to db.
I would recommend you to read this insightful article about Handling Datastore Errors that explains very well the possible timeout 's causes and how to deal with them.

Resources