Google appengine pipelines - define the queue to use - google-app-engine

I'd like to be able to set which queue to use within a pipeline, so that I can use custom settings for that pipeline in queue.yaml. The only way I can see to do this is to do so when the stage is started, via:
first_stage = ingest.CustomPipelineA(some_data)
first_stage.start(queue_name=foo)
However, I have nested and pre-requisite pipelines, such as:
with pipeline.InOrder():
yield CustomPipelineA(some_shared_data)
future_b = yield CustomPipelineB(some_shared_data)
with pipeline.After(future_b):
future_c = yield CustomPipelineC(some_shared_data, future_b)
with pipeline.After(future_c):
future_d = yield CustomPipelineD(some_shared_data, future_c)
It would be nice if I could set the queue name on the constructor, but it's not possible based on the pipeline docs: https://code.google.com/p/appengine-pipeline/wiki/GettingStarted#Execution_ordering.
Any ideas?

I think it's possible in Python (but not in Java). Here's an example from the same webpage as you linked to :
stage = MySearchEnginePipeline(15)
stage.start(queue_name='pipelinequeue')

I believe I've figured this out for Execution Ordering, within the run statement, you can:
self._context.queue_name = "my-custom-queue-name"

Related

How to send custom DocumentOperation to DocumentProcessing pipeline from a Processor?

Scenario: I've been stuck on this for way to long and I think solution might be easy but I just can't see it, this is the scenario:
cURL POST to http://localhost:8080/my_imports (raw JSON data on body)
->
MyImportsCustomHandler (extends ThreadedHttpRequestHandler [Validations]
->
MyObjectProcessor (extends Processor) [JSON deserialize and data massage]
->
MyFirstDocumentProcessor (extends DocumentProcessor) [Set some fields and save]
Problem is that execution never reaches MyFirstDocumentProcessor, likely because request didn't started from the document_api endpoints (intentionaly).
There are no errors thrown, just processing route never reaches the document processor chain, I think it should because on MyObjectProcessor I'm doing:
DocumentType type =
localDocHandler.getDocumentTypeManager().getDocumentType("my_doc");
DocumentId id = new DocumentId("id:default:my_doc::2");
Document document = new Document(type, id);
DocumentPut docPut = new DocumentPut(document);
Processing proc = com.yahoo.docproc.Processing.of(docPut);
I got this idea from here: https://github.com/vespa-engine/vespa/blob/master/docproc/src/test/java/com/yahoo/docproc/util/SplitterJoinerTestCase.java
but on that test I see this line splitter.process(p);, which I'm not able to find a suitable replacement that works inside a Processor, in that context I only have the Request, Execution and DocumentProcessingHandler
I hope somebody versed on Vespa con shine some light on this, is just the last hop on the processing chain that I can't bridge :|
To write documents from Java code, you need to use the Document Access API:
http://docs.vespa.ai/documentation/document-api-guide.html#document-access
A working solution is in https://github.com/vespa-engine/sample-apps/pull/44

Why aren't my queries and batch gets executed in parallel?

Based on the documentation for Objectify and Google Cloud Datastore, I would expect the queries and the batch loads in the following code to execute in parallel:
List<Iterable<Key<MyType>>> results = new ArrayList<>();
for (...) {
results.add(ofy().load()
.type(MyType.class)
.filter(...)
.keys()
.iterable());
}
...
Iterable<MyType> keys = ...;
Collection<MyType> c = ofy().load().keys(keys).values();
But the trace makes it look like each query and each entity load executes in sequence:
What gives?
It looks like this only happens when doing a cached get from Memcache. With similar code I see the expected async behavior for datastore_v3.Get/Put/Delete:
It seems the reason for this is that Objectify doesn't use AsyncMemcacheService. Indeed, there is an open issue for this on the project page, and this can also be confirmed by checking out the source and doing a grep -r AsyncMemcacheService.
Regarding the serial datastore_v3.RunQuery calls, calls to ofy().load().type(...).filter(...).iterable() are 'asynchronous' in that they return immediately, however the actual Datastore queries themselves get executed serially as the App Engine Datastore API doesn't expose an explicitly async API for queries.

Jmeter: How to create an array in bean shell post processor and make it available in other thread groups?

Does anyone knows how to create an array in bean shell post processor and make it available in other thread groups?
I've been searching for a while and i'm not managing to solve this.
Thanks
There is no need to do it through writing and reading files. Beanshell extension mechanism is smart enough to handle it without interim 3rd party entities.
Short answer: bsh.shared namespace
Long answer:
assuming following Test Plan Structure:
Thread Group 1
Beanshell Sampler 1
Thread Group 2
Beahshell Sampler 2
Put following Beanshell code to Beanshell Sampler 1
Map map = new HashMap();
map.put("somekey","somevalue");
bsh.shared.my_cool_map = map;
And the following to Beanshell Sampler 2
Map map = bsh.shared.my_cool_map;
log.info(map.get("somekey"));
Run it and look into jmeter.log file. You should see something like
2014/01/04 10:32:09 INFO - jmeter.util.BeanShellTestElement: somevalue
Voila.
References:
Sharing variables (from JMeter Best Practices)
How to use BeanShell: JMeter's favorite built-in component guide
Following some advice, here's how i did it:
The HTTP request has a Regular Expressions Extractor to extract the XPTO variable from the request. Then, a BeanShell PostProcessor saves data to a CSV file:
String xpto_str = vars.get("XPTO");
log.info("Variable is: " + xpto_str);
f = new FileOutputStream("/tmp/xptos.csv", true);
p = new PrintStream(f);
this.interpreter.setOut(p);
print(xpto_str + ",");
f.close();
Then, in second thread group, i added a CSV Data Set Config, in which i read the variable from the file. This is really easy, just read the guide (http://jmeter.apache.org/usermanual/component_reference.html#CSV_Data_Set_Config).
Thanks

App Engine Instance ID

Is it possible to get info on what instance you're running on? I want to output just a simple identifier for which instance the code is currently running on for logging purposes.
Since there is no language tag, and seeing your profile history, I assume you are using GAE/J?
In that case, the instance ID information is embedded in one of the environment attributes that you could get via ApiProxy.getCurrentEnvironment() method. You could then extract the instance id from the resulting map using key BackendService.INSTANCE_ID_ENV_ATTRIBUTE.
Even though the key is stored in BackendService, this approach will also work for frontend instances. So in summary, the following code would fetch the instance ID for you:
String tInstanceId = ApiProxy.getCurrentEnvironment()
.getAttributes()
.get( BackendService.INSTANCE_ID_ENV_ATTRIBUTE )
.toString();
Please keep in mind that this approach is quite undocumented by Google, and might subject to change without warning in the future. But since your use case is only for logging, I think it would be sufficient for now.
With the advent of Modules, you can get the current instance id in a more elegant way:
ModulesServiceFactory.getModulesService().getCurrentInstanceId()
Even better, you should wrap the call in a try catch so that it will work correctly locally too.
Import this
import com.google.appengine.api.modules.ModulesException;
import com.google.appengine.api.modules.ModulesServiceFactory;
Then your method can run this
String instanceId = "unknown";
try{
instanceId = ModulesServiceFactory.getModulesService().getCurrentInstanceId();
} catch (ModulesException e){
instanceId = e.getMessage();
}
Without the try catch, you will get some nasty errors when running locally.
I have found this super useful for debugging when using endpoints mixed with pub-sub and other bits to try to determine why some things work differently and to determine if it is related to new instances.
Not sure about before, but today in 2021 the system environment variable GAE_INSTANCE appears to contain the instance id:
instanceId = System.getenv("GAE_INSTANCE")

AppEngine - Optimize read/write count on POST request

I need to optimize the read/write count for a POST request that I'm using.
Some info about the request:
The user sends a JSON array of ~100 items
The servlet needs to check if any of the received items is newer then its counterpart in the datastore using a single long attribute
I'm using JDO
what i currently do is (pseudo code):
foreach(item : json.items) {
storedItem = persistenceManager.getObjectById(item.key);
if(item.long > storedItem.long) {
// Update storedItem
}
}
Which obviously results in ~100 read requests per request.
What is the best way to reduce the read count for this logic? Using JDO Query? I read that using "IN"-Queries simply results in multiple queries executed after another, so I don't think that would help me :(
There also is PersistenceManager.getObjectsById(Collection). Does that help in any way? Can't find any documentation of how many requests this will issue.
I think you can use below call to do a batch get:
Query q = pm.newQuery("select from " + Content.class.getName() + " where contentKey == :contentKeys");
Something like above query would return all objects you need.
And you can handle all the rest from here.
Best bet is
pm.getObjectsById(ids);
since that is intended for getting multiple objects in a call (particularly since you have the ids, hence keys). Certainly current code (2.0.1 and later) ought to do a single datastore call for getEntities(). See this issue

Resources