google app engine NDB records counts from NDB model - google-app-engine

How many records we can get from google app engine from single query so that we can display count to user and is we can increase timeout limit 3 seconds to 5 seconds

In my experience, ndb cannot pull more than 1000 records at a time. Here is an example of what happens if I try to use .count() on a table that contains ~500,000 records.
s~project-id> models.Transaction.query().count()
WARNING:root:suspended generator _count_async(query.py:1330) raised AssertionError()
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/utils.py", line 160, in positional_wrapper
return wrapped(*args, **kwds)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/query.py", line 1287, in count
return self.count_async(limit, **q_options).get_result()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/tasklets.py", line 383, in get_result
self.check_success()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/tasklets.py", line 427, in _help_tasklet_along
value = gen.throw(exc.__class__, exc, tb)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/query.py", line 1330, in _count_async
batch = yield rpc
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/tasklets.py", line 513, in _on_rpc_completion
result = rpc.get_result()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/api/apiproxy_stub_map.py", line 614, in get_result
return self.__get_result_hook(self)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/datastore/datastore_query.py", line 2910, in __query_result_hook
self._batch_shared.conn.check_rpc_success(rpc)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/datastore/datastore_rpc.py", line 1377, in check_rpc_success
rpc.check_success()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/api/apiproxy_stub_map.py", line 580, in check_success
self.__rpc.CheckSuccess()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/api/apiproxy_rpc.py", line 157, in _WaitImpl
self.request, self.response)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/remote_api/remote_api_stub.py", line 308, in MakeSyncCall
handler(request, response)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/remote_api/remote_api_stub.py", line 362, in _Dynamic_Next
assert next_request.offset() == 0
AssertionError
To by pass this, you can do something like:
objs = []
q = None
more = True
while more:
_objs, q, more = models.Transaction.query().fetch_page(300, start_cursor=q)
objs.extend(_objs)
But even that will eventually hit memory/timeout limits.
Currently I use Google Dataflow to pre-compute these values and store the results in Datastore as the models DaySummaries & StatsPerUser
EDIT:
snakecharmerb is correct. I was able to use .count() in the production environment, but the more entities it has to count, the longer it seems to take. Here's a screenshot of my logs viewer where it took ~15 seconds to count ~330,000 records
When I tried adding a filter to that query which returned a count of ~4500, it took about a second to run instead.
EDIT #2:
Ok I had another app engine project with a kind with ~8,000,000 records. I tried to do .count() on that in my http request handler and the request timed-out after running for 60 seconds.

Related

Google AppEngine backups reporting ApiTemporaryUnavailableError

When trying to backup the datastore from the DataStore admin page, backups fail with and error for both blobstore and cloud store targets:
Callstack for cloud store:
ApplicationError: 1
Traceback (most recent call last):
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/datastore_admin/backup_handler.py", line 642, in _ProcessPostRequest
10)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/datastore_admin/backup_handler.py", line 492, in _perform_backup
gs_bucket_name = validate_and_canonicalize_gs_bucket(gs_bucket_name)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/datastore_admin/backup_handler.py", line 1803, in validate_and_canonicalize_gs_bucket
verify_bucket_writable(bucket_name)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/datastore_admin/backup_handler.py", line 1763, in verify_bucket_writable
test_file = files.open(files.gs.create(file_name), 'a', exclusive_lock=True)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/api/files/gs.py", line 331, in create
return files._create(_GS_FILESYSTEM, filename=filename, params=params)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/api/files/file.py", line 650, in _create
_make_call('Create', request, response)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/api/files/file.py", line 255, in _make_call
_raise_app_error(e)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/api/files/file.py", line 183, in _raise_app_error
raise ApiTemporaryUnavailableError(e)
ApiTemporaryUnavailableError: ApplicationError: 1
Callstack for blob store:
ApplicationError: 1
Traceback (most recent call last):
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/webapp/_webapp25.py", line 716, in call
handler.post(*groups)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/mapreduce/base_handler.py", line 147, in post
self.handle()
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/mapreduce/handlers.py", line 1391, in handle
state)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/mapreduce/handlers.py", line 1539, in _schedule_shards
mr_state.writer_state)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/mapreduce/output_writers.py", line 726, in create
acl=acl)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/mapreduce/output_writers.py", line 640, in _create_file
return files.blobstore.create(mime_type, filename)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/api/files/blobstore.py", line 75, in create
return files._create(_BLOBSTORE_FILESYSTEM, params=params)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/api/files/file.py", line 650, in _create
_make_call('Create', request, response)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/api/files/file.py", line 255, in _make_call
_raise_app_error(e)
File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/api/files/file.py", line 183, in _raise_app_error
raise ApiTemporaryUnavailableError(e)
ApiTemporaryUnavailableError: ApplicationError: 1
Seems to be a problem with the underlying files API which is out of our control. Anyone come across this and have a solution or workaround?
I think this may have to do with the fact that the files API is deprecated.
I had the same error trying to push a mapreduce pipeline to the blobstore (as George-Bogdan is having). I decided to finally do the switch I have been avoiding (switching to the GCS client library). Once I finished the switch, my tests have been conclusive, this works properly.
It seems like it's a transient issue on the Google side. But since the Files API is deprecated, I feel safer using the library they actually suggest using now
(answer copied from my own answer on another question)

Backing up large datastore kind (1TB+) to google cloud storage

Has anyone been successful in backing up large datastore kinds to cloud storage? This is an experimental feature so support is pretty sketchy on the google end.
The kind in question we want to backup to cloud storage (ultimately with the goal of ingesting from cloud storage into big query) is currently sitting at 1.2 TB in size.
- description: BackUp
url: /_ah/datastore_admin/backup.create?name=OurApp&filesystem=gs&gs_bucket_name=OurBucket&queue=backup&kind=LargeKind
schedule: every day 00:00
timezone: America/Regina
target: ah-builtin-python-bundle
We keep running into the following error message:
Traceback (most recent call last):
File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/handlers.py", line 182, in handle
input_reader, shard_state, tstate, quota_consumer, ctx)
File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/handlers.py", line 263, in process_inputs
entity, input_reader, ctx, transient_shard_state):
File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/handlers.py", line 318, in process_data
output_writer.write(output, ctx)
File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/output_writers.py", line 711, in write
ctx.get_pool("file_pool").append(self._filename, str(data))
File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/output_writers.py", line 266, in append
self.flush()
File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/output_writers.py", line 288, in flush
f.write(data)
File "/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 297, in __exit__
self.close()
File "/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 291, in close
self._make_rpc_call_with_retry('Close', request, response)
File "/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 427, in _make_rpc_call_with_retry
_make_call(method, request, response)
File "/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 250, in _make_call
rpc.check_success()
File "/python27_runtime/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 570, in check_success
self.__rpc.CheckSuccess()
File "/python27_runtime/python27_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 133, in CheckSuccess
raise self.exception
DeadlineExceededError: The API call file.Close() took too long to respond and was cancelled.
There seems to be an undocumented time limit of 30 seconds for write operations from gae to cloud storage.
This applies also to write-ops made on a backend, so the maximum file-size you could create from the gae
in the cloud-storage depends on your throughput. Our solution is to split the file; each time the writer-task
approaches 20 seconds, it closes the current file and opens a new one and then we join these files locally. For us this results in files of about 500KB (compressed), so this might not be an acceptable solution for you...

Appengine LogService has an undocumented quota - up to 1000000 reads per day, know a way to workaround?

Appengine LogService has an undocumented quota:
You can make up to a 1,000,000 reads from it per day, and then you'll receive the following error:
Traceback (most recent call last):
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/_webapp25.py", line 701, in __call__
handler.get(*groups)
File "/base/data/home/apps/xxx/3.356325783019142341/xxx.py", line 355, in get
for request_log in logservice.fetch(start_time=start_time, end_time=end_time, version_ids=["3"]):
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/logservice/logservice.py", line 414, in __iter__
self._advance()
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/logservice/logservice.py", line 427, in _advance
response)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 94, in MakeSyncCall
return stubmap.MakeSyncCall(service, call, request, response)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 308, in MakeSyncCall
rpc.CheckSuccess()
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 133, in CheckSuccess
raise self.exception
OverQuotaError: The API call logservice.Read() required more quota than is available.
Also, when you reach this quota, you'll start see the following on your dashboard (AFAIK you don't see this line there before):
At that point it's not documented at all, and it seems it isn't billable too.
See this too: http://groups.google.com/group/google-appengine/browse_thread/thread/61fac55e1a2d521
Hope it'll save you some time.
Let me know if you can think on a workaround... (just to make it a question ;) )
You can request to have your limits increased: http://support.google.com/code/bin/request.py?&contact_type=AppEngineCPURequest

GAE error (Error code 104) while creating a new blob

On GAE this line of code:
file_name = files.blobstore.create(mime_type='image/png')
drops google.appengine.runtime.DeadlineExceededError
Here is the full method code:
class UploadsHandler(JSONRequestHandler):
def upload_blob(self, content, filename):
file_name = files.blobstore.create(mime_type='image/png')
file_str_list = split_len(content, 65520)
with files.open(file_name, 'a') as f:
for line in file_str_list:
f.write(line)
files.finalize(file_name)
return files.blobstore.get_blob_key(file_name)
Logging message ends with:
A serious problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may be throwing exceptions during the initialization of your application. (Error code 104)
Full error stack:
<class 'google.appengine.runtime.DeadlineExceededError'>:
Traceback (most recent call last):
File "/base/data/home/apps/s~mockup-cloud/1.352909931378411668/main.py", line 389, in main
util.run_wsgi_app(application)
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/util.py", line 98, in run_wsgi_app
run_bare_wsgi_app(add_wsgi_middleware(application))
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/util.py", line 116, in run_bare_wsgi_app
result = application(env, _start_response)
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/_webapp25.py", line 703, in __call__
handler.post(*groups)
File "/base/data/home/apps/s~mockup-cloud/1.352909931378411668/main.py", line 339, in post
original_key = "%s" % self.upload_blob(src)
File "/base/data/home/apps/s~mockup-cloud/1.352909931378411668/main.py", line 268, in upload_blob
file_name = files.blobstore.create(mime_type='image/png')
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/blobstore.py", line 68, in create
return files._create(_BLOBSTORE_FILESYSTEM, params=params)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 487, in _create
_make_call('Create', request, response)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 228, in _make_call
rpc.wait()
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 533, in wait
self.__rpc.Wait()
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 119, in Wait
rpc_completed = self._WaitImpl()
File "/base/python_runtime/python_lib/versions/1/google/appengine/runtime/apiproxy.py", line 131, in _WaitImpl
rpc_completed = _apphosting_runtime___python__apiproxy.Wait(self)
Blob is created while file upload. Other methods of the app work great. It looks like blobstore is not responding for under 30 sec.
Any ideas why this happens?
Thanks!
Seems like you're not the only one having this issue:
http://groups.google.com/group/google-appengine/browse_thread/thread/27e52484946cbdc1#
(posted today)
Seems like Google had some reconfigurations of their servers. Now everything's working fine as it was before.
A runtime.DeadlineExceededError occurs when your request handler takes too long to execute - the blobstore call just happened to be what was running when that happened. You need to profile your handler with appstats to see why it's so slow.

Google App Engine Application Error 5

I frequently get this Application error. What does this mean ?
File "/base/data/home/apps/0xxopdp/10.347467753731922836/matrices.py", line 215, in insert_into_db
obj.put()
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/db/__init__.py", line 895, in put
return datastore.Put(self._entity, config=config)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 404, in Put
return _GetConnection().async_put(config, entities, extra_hook).get_result()
File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 601, in get_result
self.check_success()
File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 572, in check_success
rpc.check_success()
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 502, in check_success
self.__rpc.CheckSuccess()
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 126, in CheckSuccess
raise self.exception
ApplicationError: ApplicationError: 5
I do make many calls to the datastore. What caused this problem ?
The ApplicationError:5 message tipically indicates a Timeout error.
The error is raised by the datastore API, so your application is probably trying to make more than the allowed 5 writes per seconds to db.
I would recommend you to read this insightful article about Handling Datastore Errors that explains very well the possible timeout 's causes and how to deal with them.

Resources