Why aren't my queries and batch gets executed in parallel?

Why aren't my queries and batch gets executed in parallel? - google-app-engine

Based on the documentation for Objectify and Google Cloud Datastore, I would expect the queries and the batch loads in the following code to execute in parallel:
List<Iterable<Key<MyType>>> results = new ArrayList<>();
for (...) {
results.add(ofy().load()
.type(MyType.class)
.filter(...)
.keys()
.iterable());
}
...
Iterable<MyType> keys = ...;
Collection<MyType> c = ofy().load().keys(keys).values();
But the trace makes it look like each query and each entity load executes in sequence:
What gives?

It looks like this only happens when doing a cached get from Memcache. With similar code I see the expected async behavior for datastore_v3.Get/Put/Delete:
It seems the reason for this is that Objectify doesn't use AsyncMemcacheService. Indeed, there is an open issue for this on the project page, and this can also be confirmed by checking out the source and doing a grep -r AsyncMemcacheService.
Regarding the serial datastore_v3.RunQuery calls, calls to ofy().load().type(...).filter(...).iterable() are 'asynchronous' in that they return immediately, however the actual Datastore queries themselves get executed serially as the App Engine Datastore API doesn't expose an explicitly async API for queries.

Related

How to send custom DocumentOperation to DocumentProcessing pipeline from a Processor?

Scenario: I've been stuck on this for way to long and I think solution might be easy but I just can't see it, this is the scenario:
cURL POST to http://localhost:8080/my_imports (raw JSON data on body)
->
MyImportsCustomHandler (extends ThreadedHttpRequestHandler [Validations]
->
MyObjectProcessor (extends Processor) [JSON deserialize and data massage]
->
MyFirstDocumentProcessor (extends DocumentProcessor) [Set some fields and save]
Problem is that execution never reaches MyFirstDocumentProcessor, likely because request didn't started from the document_api endpoints (intentionaly).
There are no errors thrown, just processing route never reaches the document processor chain, I think it should because on MyObjectProcessor I'm doing:
DocumentType type =
localDocHandler.getDocumentTypeManager().getDocumentType("my_doc");
DocumentId id = new DocumentId("id:default:my_doc::2");
Document document = new Document(type, id);
DocumentPut docPut = new DocumentPut(document);
Processing proc = com.yahoo.docproc.Processing.of(docPut);
I got this idea from here: https://github.com/vespa-engine/vespa/blob/master/docproc/src/test/java/com/yahoo/docproc/util/SplitterJoinerTestCase.java
but on that test I see this line splitter.process(p);, which I'm not able to find a suitable replacement that works inside a Processor, in that context I only have the Request, Execution and DocumentProcessingHandler
I hope somebody versed on Vespa con shine some light on this, is just the last hop on the processing chain that I can't bridge :|

To write documents from Java code, you need to use the Document Access API:
http://docs.vespa.ai/documentation/document-api-guide.html#document-access
A working solution is in https://github.com/vespa-engine/sample-apps/pull/44

Google appengine pipelines - define the queue to use

I'd like to be able to set which queue to use within a pipeline, so that I can use custom settings for that pipeline in queue.yaml. The only way I can see to do this is to do so when the stage is started, via:
first_stage = ingest.CustomPipelineA(some_data)
first_stage.start(queue_name=foo)
However, I have nested and pre-requisite pipelines, such as:
with pipeline.InOrder():
yield CustomPipelineA(some_shared_data)
future_b = yield CustomPipelineB(some_shared_data)
with pipeline.After(future_b):
future_c = yield CustomPipelineC(some_shared_data, future_b)
with pipeline.After(future_c):
future_d = yield CustomPipelineD(some_shared_data, future_c)
It would be nice if I could set the queue name on the constructor, but it's not possible based on the pipeline docs: https://code.google.com/p/appengine-pipeline/wiki/GettingStarted#Execution_ordering.
Any ideas?

I think it's possible in Python (but not in Java). Here's an example from the same webpage as you linked to :
stage = MySearchEnginePipeline(15)
stage.start(queue_name='pipelinequeue')

I believe I've figured this out for Execution Ordering, within the run statement, you can:
self._context.queue_name = "my-custom-queue-name"

Indexeddb: Differences between onsuccess and oncomplete?

I use two different events for the callback to respond when the IndexedDB transaction finishes or is successful:
Let's say... db : IDBDatabase object, tr : IDBTransaction object, os : IDBObjectStore object
tr = db.transaction(os_name,'readwrite');
os = tr.objectStore();
case 1 :
r = os.openCursor();
r.onsuccess = function(){
if(r.result){
callback_for_result_fetched();
r.result.continue;
}else callback_for_transaction_finish();
}
case 2:
tr.oncomplete = callback_for_transaction_finish();
It is a waste if both of them work similarly. So can you tell me, is there any difference between them?

Sorry for raising up quite an old thread, but it's questioning is a good starting point...
I've looked for a similar question but in a bit different use case and actually found no good answers or even a misleading ones.
Think of a use case when you need to make several writes into the objectStore of even into several ones. You definitely don't want to manage each single write and it's own success and error events. That is the meaning of transaction and this is the (proper) implementation of it for indexedDB:
var trx = dbInstance.transaction([storeIdA, storeIdB], 'readwrite'),
storeA = trx.objectStore(storeIdA),
storeB = trx.objectStore(storeIdB);
trx.oncomplete = function(event) {
// this code will run only when ALL of the following requests are succeed
// and only AFTER ALL of them were processed
};
trx.onerror = function(error) {
// this code will run if ANY of the following requests will fail
// and only AFTER ALL of them were processed
};
storeA.put({ key:keyA, value:valueA });
storeA.put({ key:keyB, value:valueB });
storeB.put({ key:keyA, value:valueA });
storeB.put({ key:keyB, value:valueB });
Clue to this understanding is to be found in the following statement of W3C spec:
To determine if a transaction has completed successfully, listen to the transaction’s complete event rather than the success event of a particular request, because the transaction may still fail after the success event fires.

While it's true these callbacks function similarly they are not the same: the difference between onsuccess and oncomplete is that transactions complete but requests, which are made on those transactions, are successful.
oncomplete is only defined in the spec as related to a transaction. A transaction doesn't have an onsuccess callback.

I would only caution that there is no garentee that getting a successful trx.oncomplete means the data was written to the disk/database:
We are seeing a problem with trx.oncomplete where the data is not being written to the db on disk. FireFox has an explanation of what they did that is causing this problem here: https://developer.mozilla.org/en-US/docs/Web/API/IDBTransaction/oncomplete
It seems that windows/edge is also having the same issue. Basically, there is no guarantee that your app will have data written to the database, if/when the user decides to kill or power down the device. We've even tried waiting up to 15 minutes before shutting down in some cases and haven't seen the data written. For me I'd always want to ensure that a data write completes and is committed.
Are there other solutions for a real persistent database, or enhancements to the IndexedDB beyond FF experimental add...

AppEngine - JDO - a single call to a batch get query.execute(...) raises several datastore_v3.Get calls

I have implemented an AppEngine Java app. It runs just fine - except that i'm having way too many datastore read operations.
So I installed the appstats tool to analyze it. It showed up that on a single request i am doing in one point of my code:
Query query = persistenceManager.newQuery(Info.class,
":keys.contains(key)");
List<Info> storedInfos = (List<Info>) query.execute(keys);
That single call to execute(...) results in multiple datastore_v3.Get calls. I get this stack trace multiple times:
com.google.appengine.tools.appstats.Recorder:297 makeAsyncCall()
com.google.apphosting.api.ApiProxy:184 makeAsyncCall()
com.google.appengine.api.datastore.DatastoreApiHelper:59 makeAsyncCall()
com.google.appengine.api.datastore.AsyncDatastoreServiceImpl:351 doBatchGetBySize()
com.google.appengine.api.datastore.AsyncDatastoreServiceImpl:400 doBatchGetByEntityGroups()
com.google.appengine.api.datastore.AsyncDatastoreServiceImpl:292 get()
com.google.appengine.api.datastore.DatastoreServiceImpl:87 get()
com.google.appengine.datanucleus.WrappedDatastoreService:90 get()
com.google.appengine.datanucleus.query.DatastoreQuery:374 executeBatchGetQuery()
com.google.appengine.datanucleus.query.DatastoreQuery:278 performExecute()
com.google.appengine.datanucleus.query.JDOQLQuery:164 performExecute()
org.datanucleus.store.query.Query:1791 executeQuery()
org.datanucleus.store.query.Query:1667 executeWithArray()
org.datanucleus.api.jdo.JDOQuery:243 execute()
de.goddchen.appengine.app.InfosServlet:78 doPost()
It is even calling executeBatchGetQuery so why is this issued multiple times?
I have already tried out some datastore/persistencemanager settings but none helped :(
Any ideas?

You're probably seeing keys being broken into groups and the query being executing asynchronously. How many keys are you querying for?

AppEngine - Optimize read/write count on POST request

I need to optimize the read/write count for a POST request that I'm using.
Some info about the request:
The user sends a JSON array of ~100 items
The servlet needs to check if any of the received items is newer then its counterpart in the datastore using a single long attribute
I'm using JDO
what i currently do is (pseudo code):
foreach(item : json.items) {
storedItem = persistenceManager.getObjectById(item.key);
if(item.long > storedItem.long) {
// Update storedItem
}
}
Which obviously results in ~100 read requests per request.
What is the best way to reduce the read count for this logic? Using JDO Query? I read that using "IN"-Queries simply results in multiple queries executed after another, so I don't think that would help me :(
There also is PersistenceManager.getObjectsById(Collection). Does that help in any way? Can't find any documentation of how many requests this will issue.

I think you can use below call to do a batch get:
Query q = pm.newQuery("select from " + Content.class.getName() + " where contentKey == :contentKeys");
Something like above query would return all objects you need.
And you can handle all the rest from here.

Best bet is
pm.getObjectsById(ids);
since that is intended for getting multiple objects in a call (particularly since you have the ids, hence keys). Certainly current code (2.0.1 and later) ought to do a single datastore call for getEntities(). See this issue

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Why aren't my queries and batch gets executed in parallel? - google-app-engine

Related

How to send custom DocumentOperation to DocumentProcessing pipeline from a Processor?

Google appengine pipelines - define the queue to use

Indexeddb: Differences between onsuccess and oncomplete?

AppEngine - JDO - a single call to a batch get query.execute(...) raises several datastore_v3.Get calls

AppEngine - Optimize read/write count on POST request

Categories

Resources