When submitting a task from an version, the task ends up in different version for execution. How do I make the task executing in the same deploying version?
Note:
I tried 'target' in queue.xml, the result is the same. Tasks will be executed in random different version. It is not always the same.
What's wrong with my setup?
[UPDATE]
<queue>
<name>shopinionMessage</name>
<rate>10/s</rate>
<retry-parameters>
<task-retry-limit>60</task-retry-limit>
<min-backoff-seconds>1</min-backoff-seconds>
<max-backoff-seconds>30</max-backoff-seconds>
<max-doublings>0</max-doublings>
</retry-parameters>
<target>2</target>
</queue>
https://developers.google.com/appengine/docs/java/config/queue#target says that target is
A string naming a module/version, a frontend version, or a backend, on which to execute this task.
Do you have modules perhaps? If yes, you should try my-version.my-module as target; unfortunately you won't have any luck with that either as of now: https://code.google.com/p/googleappengine/issues/detail?id=10954
By the way, without a target it shouldn't be random regarding where the task is executed:
If target is unspecified, then tasks are invoked on the same version of the application where they were enqueued. So, if you enqueued a task from the default application version without specifying a target on the queue, the task is invoked in the default application version. Note that if the default application version changes between the time that the task is enqueued and the time that it executes, then the task will run in the new default version.
Related
Recently I upgraded my zeppelin from 0.8.1 to 0.9.0-preview (Also upgraded spark from 2.2 to 3.0,1).
Here I am not able to execute notebooks parallelly(by same user or different user). First executed notebook submit job on spark keeps running on other hand all other notebooks shows as waiting.
Even after first notbook is successfully completed, other notbooks not able to execute.
I was able run multiple notebook simultaneously in previous version.
Setting in zeppelin intrpreter is
You get only one session when you share your session globally. That means every executed paragraph is queued and processed sequentially as shown in the picture below:
Depending on your working environment you should change your setting to per note or to per per user (in case of a multiuser environment) and scoped or isolated mode.
Below is an overview from the official documentation of the advantages and disadvantages of the shared, isolated, and scoped mode from a notebook perspective:
I wanted to understand the limitations of LocalExecutionEnvironment and if it can be used to run in production ?
Appreciate any help/insight. Thanks
LocalExecutionEnvironment spins up a Flink MiniCluster, which runs the entire Flink system (JobManager, TaskManager) in a single JVM. So you're limited to CPU cores and memory available on that one machine. You also don't have HA from multiple JobManagers. I haven't looked at other limitations of the MiniCluster environment, but I'm sure more exist.
A LocalExecutionEnvironment doesn't load a config file on startup, so you have to do all of the configuration in the application. By default it also doesn't offer a REST endpoint. You can solve both these issues by doing something like this:
String cwd = Paths.get(".").toAbsolutePath().normalize().toString();
Configuration conf = GlobalConfiguration.loadConfiguration(cwd);
env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf);
Logging may be another issue that will require a workaround.
I don't believe you'll be able to use the Flink CLI to control the job, but if you create the Web UI (as shown above) you can at least use the REST API to do things like triggering savepoints (after first using the REST API to get the job ID).
I have a simple question that I can't find information on regarding Apache Camel-Quartz. For Camel-Quartz to work do you have to deploy inside a web container like Tomcat? And hence because the application will always be alive it will know when to run?
I'm asking because if you deploy your Camel application in a stand alone JVM I don't see how the application will be smart enough to understand when to run.
thanks
Quartz is embedded with your Camel application and thus when you start Camel, quartz is also started. And then it knows when to run, as long you keep the Camel application running.
There is no magic in there. Its just java code that runs, and Quartz is also just java code. And it does not require a special server etc. Quartz is just a library (some JAR files) that you run together with your own application.
Quartz just have logic for scheduling jobs (eg its like a big clock-work) that knows what the time is, and when to trigger jobs accordingly to how you tell it to.
I have a couple of questions about the App Engine Map Reduce API. First of all there's a mapreduce package in the SDK, and there's a separate mapreduce bundle here:
https://developers.google.com/appengine/downloads
Which one should I be using? Should I be using the bundle, or is the documentation out of date and I should actually use the SDK version?
Second I'd like to be able to run mapreduce's on a non-default version to make sure that the requests from the mapreduce don't interfere with user requests.
What's the best way to do this? Can I start the pipeline with a task queue, and set the target version of that queue to be my non-default version?
We recommend using the open source version of Map Reduce for GAE at http://code.google.com/p/appengine-mapreduce/
The stale bundle link in the docs is a bug. That'll get cleaned up soon.
A few of our SDKs have bits of MapReduce (for historic reasons), but the open source version is the way to go for now.
As for using a separate version, this is kind of "it depends". If you're thinking of interference in terms of competition for the processor, that's not likely to be a noticeable issue. Depending on queue processing rates you've set up, more instances of your app will be spun up to handle mapping tasks as needed. I'd try some experiments first. Make sure you have a problem before you invest time and effort solving it.
mapreduce can be start on a not default version. And after it starts, it will continue run on that version automatically.
In my case I just deploy the code on a non default version and trigger the mapreduce with version_id.app_id.appspot.com/path_to_start_a_job.
cron job can also trigger the mapreduce on non default version without problem.
Recently I've started using limited staging on my Google App Engine project. The data is still shared between all versions, but behaviour (especially user facing behaviour) is different.
Naturally when I implement something incredibly new it only runs on the latest version of my code and I don't feel like it should be backported to the older versions.
Some of this new functionality requires cron jobs to be run periodically, but I'm hitting a problem. I have to run a cron job to call the latest code, but this is what Google's documentation has to say about the issue:
Cron requests are always sent to the default version of the application.
The default version is the oldest because the first versions of the client code that went out to users weren't future proof and don't know how to select which API version to call.
So my question is, how can I get around this limitation and make a cron job that will call the latest rather than the default version of the application?
You can now specify a version using the target tag.
<target>version-2</target>
You can't change the cron jobs to run on a different version then the default.
Depending on how much time your cron job takes to run you could change your cron job script to to do a URLFetch to "http://latest.appname.appspot.com/cron_job_endpoint".
If you're cron job takes longer then 10 minutes to run, then I would design it in a way that you can chain the different tasks using task queues.