I have a simple question that I can't find information on regarding Apache Camel-Quartz. For Camel-Quartz to work do you have to deploy inside a web container like Tomcat? And hence because the application will always be alive it will know when to run?
I'm asking because if you deploy your Camel application in a stand alone JVM I don't see how the application will be smart enough to understand when to run.
thanks
Quartz is embedded with your Camel application and thus when you start Camel, quartz is also started. And then it knows when to run, as long you keep the Camel application running.
There is no magic in there. Its just java code that runs, and Quartz is also just java code. And it does not require a special server etc. Quartz is just a library (some JAR files) that you run together with your own application.
Quartz just have logic for scheduling jobs (eg its like a big clock-work) that knows what the time is, and when to trigger jobs accordingly to how you tell it to.
Related
I have been a long-time user of Google App Engine's Mapreduce library for processing data in the Google Datastore. Google no longer supports it and it doesn't work at all in Python 3. I'm trying to migrate our older Mapreduce jobs to Google's Dataflow / Apache Beam runner, but the official documentation is awful, it just describes Apache Beam, it does not tell you how to migrate.
In particular, the issues are this:
in Mapreduce, the jobs would run on your existing deployed application. However in Beam you have to create and deploy a custom Docker image to build the environment for Dataflow, is this right?
To create a new job template in Mapreduce, you just need to edit a yaml file and deploy it. To create one in Apache beam, you need to create custom runner code, a template file deployed to google cloud storage, and link up with the docker image, is this right?
Is the above accurate? If so, is it generally the case that working with Dataflow is much more difficult than Mapreduce? Are there any libraries or tips for making this easier?
In technical terms that's what is happening, but unless you have some specific advanced use-cases, you won't need to set any custom Docker images manually. Dataflow does some work in the background to run your user code and dependencies on a custom container so that it can execute your user-written code and dependencies on their VMs.
In Dataflow, writing a job template mainly requires writing some pipeline code in your chosen language (Java or Python), and possibly writing some metadata. Once your code is written, creating and staging the template itself isn't much different than running a normal Dataflow job. There's a page documenting the process.
I agree the page on Mapreduce to Beam migration is very sparse and unhelpful, although I think I understand why that is. Migrating from Mapreduce to Beam isn't a straightforward 1:1 migration where only the syntax changes. It's a different pipeline model and most likely will require some level of rewriting your code for the migration. A migration guide that fully covered everything would end up repeating most of the existing documentation.
Since it sounds like most of your questions are around setting up and executing Beam pipelines, I encourage you to begin with the Dataflow quickstart in your chosen language. It won't teach you how to write pipelines, but will teach you how to set up your environment to write and run pipelines. There are links in the quickstarts which direct you to Apache Beam tutorials that teach you the Beam API and how to write your own pipelines, and those will be useful for rewriting your Mapreduce code in Beam.
I wanted to understand the limitations of LocalExecutionEnvironment and if it can be used to run in production ?
Appreciate any help/insight. Thanks
LocalExecutionEnvironment spins up a Flink MiniCluster, which runs the entire Flink system (JobManager, TaskManager) in a single JVM. So you're limited to CPU cores and memory available on that one machine. You also don't have HA from multiple JobManagers. I haven't looked at other limitations of the MiniCluster environment, but I'm sure more exist.
A LocalExecutionEnvironment doesn't load a config file on startup, so you have to do all of the configuration in the application. By default it also doesn't offer a REST endpoint. You can solve both these issues by doing something like this:
String cwd = Paths.get(".").toAbsolutePath().normalize().toString();
Configuration conf = GlobalConfiguration.loadConfiguration(cwd);
env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf);
Logging may be another issue that will require a workaround.
I don't believe you'll be able to use the Flink CLI to control the job, but if you create the Web UI (as shown above) you can at least use the REST API to do things like triggering savepoints (after first using the REST API to get the job ID).
I have a couple of questions about the App Engine Map Reduce API. First of all there's a mapreduce package in the SDK, and there's a separate mapreduce bundle here:
https://developers.google.com/appengine/downloads
Which one should I be using? Should I be using the bundle, or is the documentation out of date and I should actually use the SDK version?
Second I'd like to be able to run mapreduce's on a non-default version to make sure that the requests from the mapreduce don't interfere with user requests.
What's the best way to do this? Can I start the pipeline with a task queue, and set the target version of that queue to be my non-default version?
We recommend using the open source version of Map Reduce for GAE at http://code.google.com/p/appengine-mapreduce/
The stale bundle link in the docs is a bug. That'll get cleaned up soon.
A few of our SDKs have bits of MapReduce (for historic reasons), but the open source version is the way to go for now.
As for using a separate version, this is kind of "it depends". If you're thinking of interference in terms of competition for the processor, that's not likely to be a noticeable issue. Depending on queue processing rates you've set up, more instances of your app will be spun up to handle mapping tasks as needed. I'd try some experiments first. Make sure you have a problem before you invest time and effort solving it.
mapreduce can be start on a not default version. And after it starts, it will continue run on that version automatically.
In my case I just deploy the code on a non default version and trigger the mapreduce with version_id.app_id.appspot.com/path_to_start_a_job.
cron job can also trigger the mapreduce on non default version without problem.
What I want is to run my WPF automation tests (integration tests) in the continuous integration process when possible. It means, everytime something is pushed to the source control I want to trigger an event that starts the WPF automation tests. However integration tests are slower than unit tests that is why I would like to execute them in Parallel, in several Virtual Machines. Is there any framework or tools that allows me to run my WPF automation tests in parallel?
We use Jenkins. Our system tests are built on top of a proprietary framework written in C#.
Jenkins allows jobs to be triggered by SCM changes (SVN, Git, and Mercurial are all supported via plugins). It also allows jobs to be run on remote slaves (in parallel, if needed). You do need to configure your jobs and slaves by hand. Configuring jobs can be done with build parameters: say, you have only one job that accepts test id's as parameters, but it can run on several slaves; you can configure one trigger job that will start several test jobs on different slaves passing to them test id's as parameters.
Configuring slaves is made much easier when your slaves are virtual machines. You configure one VM and then copy it (make sure that node-specific information, such as Node Name is not hard-coded and is easily configurable).
The main advantages of Jenkins:
It's free
It has an extendable architecture allowing people to extend it via plugins. As a matter of fact, at this stage (unlike, say, a year and a half ago) everything I need to do can be done either via plugins or Jenkins HTTP API.
It's can be rapidly deployed. Jenkins runs in its own servlet container. You can deploy and start playing with it in less than an hour.
Active community. Check, for example, [jenkins], [hudson], and [jenkins-plugins] tags on SO.
I propose that you give it a try: play with it for a couple of days, chance are you'll like it.
Update: Here's an old answer I've liked recommending Jenkins.
I would like my web app to log using SLF4j and logback. However, I am using ActiveMQ - which then requires that some if its jars go in /usr/share/tomcat6/lib (this is because the queues are defined outside of the web app so the classes to support them must be at container level).
ActiveMQ 5.5+ requires SLF4j-api so that jar has to go in to. Because SLF4j is now starting it needs to have a logging library added or it will simply nop. Thus, logback-core and logback-classic go in too.
After quite some frustration I got this working well enough that I can tidy it up shortly. I needed to configure logback to use a JNDI lookup to get the context. Then it can lookup logback-kenobi.xml in my web app and have a separate configuration there.
However, I'm wondering if this is the best way to do this. For one, the context handling appears not to support the groovy format. I did have a logback.groovy in my web app that logged to console when I was developing locally (which means that Eclipse WTP works nicely) but logs to file and to Splunk Storm when everywhere else. I'm going to want to do something similar with this setup but I'm not sure if I should do that by overwriting the logback-kenobi.xml or some other method.
Note that I don't, currently, need Tomcat itself to log with slf4j although I am planning to do that. Nor do I really need ActiveMQ to log with slf4j but I did need it to stop spewing debug messages every 30s as it was doing. I am aware of tomcat-slf4j-logbak but I don't believe it is directly useful as it is ActiveMQ requiring logging which is the issue.
However, I'm wondering if this is the best way to do this.
Best is an opinion, working is a fact.