In my application I have to send data at particular interval of time from vespa.How should I achieve it? Is there any cron job or scheduler service to achieve this?
Please help.
There is no built in scheduler but you are probably already familiar with developing container applications so you can run your own scheduler and export documents, or perform queries?
Related
I want to write a task that is triggered by apache flink after every 24 hours and then processed by flink. What is the possible way to do this? Does flink provide any job scheduling functionality?
Apache Flink is not a job scheduler but an event processing engine which is a different paradigm, as Flink jobs are supposed to run continuously instead of being triggered by a schedule.
That said, you could achieve the functionality by simply using an off the shelve scheduler (i.e. cron) who is scheduled to start a job on your Flink cluster and then stop it after you receive some sort of notification that the job was done (i.e. through a Kafka topic) or simply use a timeout after which you would assume that the job is finished and you can stop the job. But again, especially because Flink is not designed for this kind of use cases, you would most certainly run into edge cases which Flink does not support.
Alternatively you can simply use a 24 hour tumbling window and run your task in the corresponding trigger function. See https://flink.apache.org/news/2015/12/04/Introducing-windows.html for details on that matter.
I have flink batch job. What is the best way to run continuously? (It needs to restart when it's finished because the streaming job can provide new data)
I want to restart the job immediately if it's finished.
Infinite cycle and inside call the tasks?
Make a bash script and always push the job into the jobmanager? (I think it's a really big resource waste)
Thanks
In a similar use-case where we run Flink job against same collection; we trigger new job at periodic intervals. [daily, hourly etc.] https://azkaban.github.io/ can be used for scheduling. This is NOT really what you mentioned. But, a close-match which might be sufficient to solve your use-case.
I use Google App Engine Java for over a year. Trying write a simple strategy game.
Unfortunately current solution does not satisfy me. My experience with GAE is even insufficient.
My problem is to simulate a battle at a specific time, which shall be calculated in the task queue.
Current solution:
I have entities in Datastore :
BattleInfo ( field: battleID, startTime, userA, userB)
I have prepared two servlets: BattleInitServlet and BattleServlet.
BattleInitServlet - execute the query to the datastore and loads BattleInfo. Tasks are added to the battle-queue:
Queue queue = QueueFactory.getQueue("battle-queue");
...
for(BattleInfo i : list){
long countdownMillis = i.getStartTime() - currentTime;
queue.add(TaskOptions.Builder.withUrl("/battle").param("battleID", i.getBattleID()).method(TaskOptions.Method.POST).countdownMillis(countdownMillis));
}
BattleServlet - task will start at a specific time. I receive BattleInfo from the battleID and loads the required data. Simulates the battle and save to datastore.
BattleInitServlet is executed every 30 minutes using cron.
With a few simulation , result is OK. Tests with a large amount of simulation, task queue is clogged. I can solve this by changing the settings Queue, but I increase costs, and does not solve the problem. I do not know how to speed up and optimize.
I tested the pipeline API that works fast. The problem is that the pipeline API initiates a moment and executed tasks immediately .
I want to prepare a pipeline using tasks that runs in the background and executed the battle for a specified time and simulates it. Just don't know how to write it using GAE and tasks.
Has anyone had a similar problem?
Please help me. Thank you.
Can't you link the Pipeline API with your Cron Job to delay execution?
FYI, I guess Pipeline API also leverages Task Queues internally but uses sharding, slicing to optimize the performance.
I am new to the community and looking forward to being a contributing member. I wanted to throw this out there and see if anyone had an advice:
I am currently in the middle of developing a MVC 3 app that controls various SQL Jobs. It basically allows user to schedule jobs to be completed in the future, but also also allows them to run jobs on demand.
I was thinking of having a thread run in the web app that pulls entity information into an XML file, and writing a window service to monitor this file to perform the requested jobs. Does this sound like a good method? Has anyone done something like this before and used a different approach? Any advice would be great. I will keep the forum posted on progress and practices.
Thanks
I can see you running into some issues using a file for complex communication between processes - files can generally only be written by one process at a time, so what happens if the worker process tries to remove a task at the same time as the web process tries to add a task?
A better approach would be to store the tasks in a database that is accessible to both processes - a database can be written to by multiple processes, and it is easy to select all tasks that have a scheduled date in the past.
Using a database you don't get to use FileSystemWatcher, which I suspect is one of the main reasons you want to use a file. If you really need the job to run instantly there are various sorts of messaging you could use, but for most purposes you can just check the queue table on a timer.
Let's say I have 1000's of jobs to perform repeatedly, how would you propose I architect my system on Google AppEngine?
I need to be able to add more jobs whilst effectively scaling the system. Scheduled Tasks are part of the solution of course as well as Task Queues but I am looking for more insights has to best utilize these resources.
NOTE: There are no dependencies between "jobs".
Based on what little description you've provided, it's hard to say. You probably want to use the Task Queue, and maybe the deferred library if you're using Python. All that's required to use these is to use the API to enqueue a task.
If you're talking about having many repeating tasks, you have a couple of options:
Start off the first task on the task queue manually, and use 'chaining' to have each invocation queue the next one with the appropriate interval.
Store each schedule in the datastore. Have a cron job regularly scan for any tasks that have reached their ETA; fire off a task queue task for each, updating the ETA for the next run.
I think you could use Cron Jobs.
Regards.