AppEngine - Optimize read/write count on POST request - google-app-engine

I need to optimize the read/write count for a POST request that I'm using.
Some info about the request:
The user sends a JSON array of ~100 items
The servlet needs to check if any of the received items is newer then its counterpart in the datastore using a single long attribute
I'm using JDO
what i currently do is (pseudo code):
foreach(item : json.items) {
storedItem = persistenceManager.getObjectById(item.key);
if(item.long > storedItem.long) {
// Update storedItem
}
}
Which obviously results in ~100 read requests per request.
What is the best way to reduce the read count for this logic? Using JDO Query? I read that using "IN"-Queries simply results in multiple queries executed after another, so I don't think that would help me :(
There also is PersistenceManager.getObjectsById(Collection). Does that help in any way? Can't find any documentation of how many requests this will issue.

I think you can use below call to do a batch get:
Query q = pm.newQuery("select from " + Content.class.getName() + " where contentKey == :contentKeys");
Something like above query would return all objects you need.
And you can handle all the rest from here.

Best bet is
pm.getObjectsById(ids);
since that is intended for getting multiple objects in a call (particularly since you have the ids, hence keys). Certainly current code (2.0.1 and later) ought to do a single datastore call for getEntities(). See this issue

Related

How to send custom DocumentOperation to DocumentProcessing pipeline from a Processor?

Scenario: I've been stuck on this for way to long and I think solution might be easy but I just can't see it, this is the scenario:
cURL POST to http://localhost:8080/my_imports (raw JSON data on body)
->
MyImportsCustomHandler (extends ThreadedHttpRequestHandler [Validations]
->
MyObjectProcessor (extends Processor) [JSON deserialize and data massage]
->
MyFirstDocumentProcessor (extends DocumentProcessor) [Set some fields and save]
Problem is that execution never reaches MyFirstDocumentProcessor, likely because request didn't started from the document_api endpoints (intentionaly).
There are no errors thrown, just processing route never reaches the document processor chain, I think it should because on MyObjectProcessor I'm doing:
DocumentType type =
localDocHandler.getDocumentTypeManager().getDocumentType("my_doc");
DocumentId id = new DocumentId("id:default:my_doc::2");
Document document = new Document(type, id);
DocumentPut docPut = new DocumentPut(document);
Processing proc = com.yahoo.docproc.Processing.of(docPut);
I got this idea from here: https://github.com/vespa-engine/vespa/blob/master/docproc/src/test/java/com/yahoo/docproc/util/SplitterJoinerTestCase.java
but on that test I see this line splitter.process(p);, which I'm not able to find a suitable replacement that works inside a Processor, in that context I only have the Request, Execution and DocumentProcessingHandler
I hope somebody versed on Vespa con shine some light on this, is just the last hop on the processing chain that I can't bridge :|
To write documents from Java code, you need to use the Document Access API:
http://docs.vespa.ai/documentation/document-api-guide.html#document-access
A working solution is in https://github.com/vespa-engine/sample-apps/pull/44

Why aren't my queries and batch gets executed in parallel?

Based on the documentation for Objectify and Google Cloud Datastore, I would expect the queries and the batch loads in the following code to execute in parallel:
List<Iterable<Key<MyType>>> results = new ArrayList<>();
for (...) {
results.add(ofy().load()
.type(MyType.class)
.filter(...)
.keys()
.iterable());
}
...
Iterable<MyType> keys = ...;
Collection<MyType> c = ofy().load().keys(keys).values();
But the trace makes it look like each query and each entity load executes in sequence:
What gives?
It looks like this only happens when doing a cached get from Memcache. With similar code I see the expected async behavior for datastore_v3.Get/Put/Delete:
It seems the reason for this is that Objectify doesn't use AsyncMemcacheService. Indeed, there is an open issue for this on the project page, and this can also be confirmed by checking out the source and doing a grep -r AsyncMemcacheService.
Regarding the serial datastore_v3.RunQuery calls, calls to ofy().load().type(...).filter(...).iterable() are 'asynchronous' in that they return immediately, however the actual Datastore queries themselves get executed serially as the App Engine Datastore API doesn't expose an explicitly async API for queries.

Make a solr query from Geotools through geoserver

I come here because I am searching (like the title mentionned) to do a query from geotools (through geoserver) to get feature from a solr index.
To be more precise :
I saw on geoserver user manual that i can do query on solr like this in http :
http://localhost:8080/geoserver/wfs?service=WFS&version=1.1.0&request=GetFeature
&typeName=mySolrLayer
&format="xxx"
&viewparams=q:"mySolrQuery"
The important part on this URL is the viewparams that I want to use directly from geotools.
I have already test this case (this is a part of my code):
url = new URL(
"http://localhost:8080/geoserver/wfs?request=GetCapabilities&VERSION=1.1.0";
);
Map<String, String> param = new HashMap();
params.put(WFSDataStoreFactory.URL.key, url);
param.put("viewparams","q:myquery");
Hints hints = new Hints();
hints.put(Hints.VIRTUAL_TABLE_PARAMETERS, viewParams);
query.setHints(hints);
...
featureSource.getFeatures(query);
But here, it seems to doesn't work, the url send to geoserver is a normal "GET FEATURE" request without the viewparams parameter.
I tried this with geotools-12.2 ; geotools-13.2 and geotools-15-SNAPSHOT but I didn't succeed to pass the query, geoserver send me all the feature in my database and doesn't take "viewparams" as a param.
I need to do it like this because actually the query come from another program and I would easily communicate this query to another part of the project...
If someone can help me ?
There doesn't currently seem to be a way to do this in the GeoTool's WFSDatastore implementations as the GetFeature request is constructed from the URL provided by the getCapabilities document. This is as the standard requires but it may be worth making a feature enhancement request to allow clients to override this string (as QGIS does for example) which would let you specify the additional parameter in your base URL which would then be passed to the server as you need.
Unfortunately the WFS module lives in Unsupported land at present so unless you have resources to work on this issue yourself and can provide a PR to implement it there is not a great chance of it being implemented.

Cassandra query returns empty array on first query but non-empty array on subsequent queries

I am trying to do a simple query against a Cassandra cluster using the Node.js Cassandra driver. I am following the examples, but it seems like my callback functions are getting called even if there are no results in the returned set.
var q = function (param) {
var query = 'SELECT * FROM my_table WHERE my_col=?';
var params = [param];
console.log("Cassandra query is being called with parameters " + params);
client.execute(query, params, function (err, result) {
console.log("Cassandra callback is being called. Row count is " + result.rows.length);
if (err != null) {
console.log("Got error");
}
if (result.rows.length <= 0) {
console.log("Sending empty response.");
}
else {
console.log("Sending non-empty response.");
}
});
};
q('value');
q('value');
Outputs
Cassandra query is being called with parameters value
Cassandra query is being called with parameters value
Cassandra callback is being called. Row count is 0
Sending empty response.
Cassandra callback is being called. Row count is 1
Sending non-empty response.
This is happening fairly consistently, but sometimes both calls will come up empty, and sometime both calls will return values.
I guess I'm doing something wrong with the async calls here, but I'm not really sure what it is.
I think this was due to a bad node in my cluster. One of the nodes was missing a bunch of data, and so one third of the time I would get back no results.
When you open the connection do you send down a consistency command?
I had this issue as well, when I open a connection call execute and send use <keyspace>; followed by consistency local_quorum;. This tells Cassandra to look a little harder for the answer and agree with other nodes that you are getting the latest version.
Note: I keep my connections around for a while, so this doesn't add any meaningful overhead (for me).
Note2: The above doesn't seem to work from my python app anymore (using Cassandra 3.5). However, using SimpleStatement and setting the consistency before calling execute does: https://datastax.github.io/python-driver/getting_started.html#setting-a-consistency-level
I hope this helps.

AppEngine - JDO - a single call to a batch get query.execute(...) raises several datastore_v3.Get calls

I have implemented an AppEngine Java app. It runs just fine - except that i'm having way too many datastore read operations.
So I installed the appstats tool to analyze it. It showed up that on a single request i am doing in one point of my code:
Query query = persistenceManager.newQuery(Info.class,
":keys.contains(key)");
List<Info> storedInfos = (List<Info>) query.execute(keys);
That single call to execute(...) results in multiple datastore_v3.Get calls. I get this stack trace multiple times:
com.google.appengine.tools.appstats.Recorder:297 makeAsyncCall()
com.google.apphosting.api.ApiProxy:184 makeAsyncCall()
com.google.appengine.api.datastore.DatastoreApiHelper:59 makeAsyncCall()
com.google.appengine.api.datastore.AsyncDatastoreServiceImpl:351 doBatchGetBySize()
com.google.appengine.api.datastore.AsyncDatastoreServiceImpl:400 doBatchGetByEntityGroups()
com.google.appengine.api.datastore.AsyncDatastoreServiceImpl:292 get()
com.google.appengine.api.datastore.DatastoreServiceImpl:87 get()
com.google.appengine.datanucleus.WrappedDatastoreService:90 get()
com.google.appengine.datanucleus.query.DatastoreQuery:374 executeBatchGetQuery()
com.google.appengine.datanucleus.query.DatastoreQuery:278 performExecute()
com.google.appengine.datanucleus.query.JDOQLQuery:164 performExecute()
org.datanucleus.store.query.Query:1791 executeQuery()
org.datanucleus.store.query.Query:1667 executeWithArray()
org.datanucleus.api.jdo.JDOQuery:243 execute()
de.goddchen.appengine.app.InfosServlet:78 doPost()
It is even calling executeBatchGetQuery so why is this issued multiple times?
I have already tried out some datastore/persistencemanager settings but none helped :(
Any ideas?
You're probably seeing keys being broken into groups and the query being executing asynchronously. How many keys are you querying for?

Resources