Lets assume that I have a File Consumer that polls a directory every 10 seconds and does some sort of processing to the files it has found there.
This processing may take 40 seconds for each file. This means that during that interval the Cosumer will poll the directory again, and start another similar process?
Is there any way I can avoid that, and not allow the Consumer to poll if the previous poll has not finished?
The file consumer is single threaded so it will not poll while it already process files.
When the consumer finishes it will delay for 10s before polling again. This is controlled by useFixedDelay option which you can read more about in the JDK ScheduledExecutorService which is used by Camel as the scheduler.
Related
While I understand that select call basically puts the process in wait until one of the file descriptors passed to monitor is ready.
In contrast to constantly checking the file descriptor until it is ready, the select call provides better performance as it doesn't use any cpu cycles for checking the file descriptors.
But how does it really work underneath? How does the monitoring part work without constantly checking the file descriptor's status? In case of file descriptor being a socket, the NIC could trigger an interrupt but how would it work for regular files or stdin or stdout streams?
The core of select is a system call. It tells the operating system the process wants to wait for activity on the file descriptors. The operating system updates its records to show the process is in a waiting state, not ready to run, and it does not run the process until something happens.
Then the operating system goes on to do other things. It runs other processes on the processor(s), it responds to device interrupts, and so on. When a time comes that there is nothing for the system to do—no processes are ready to run (that are not already running on one of the processors) and all interrupts have been serviced, the operating system executes some sort of wait or sleep instruction that lets the processor go dormant for a time.
When the processor is in a wait or sleep state, the hardware will wake it when an interrupt arrives.
Here is my question :
Will filnk finish the sink process and rename the .inprogress files to part-x-x files when sending a stop command?
I find my flink tasks(using flink-1.9.1) will not rename the .inprogress files to part-x-x files. But I read the source code, it says
org.apache.flink.client.program.ClusterClient#stopWithSavepoint:
* Stops a program on Flink cluster whose job-manager is configured in this client's configuration.
* Stopping works only for streaming programs. Be aware, that the program might continue to run for
* a while after sending the stop command, because after sources stopped to emit data all operators
* need to finish processing.
The StreamingFileSink does have some limitations in this regard. See this thread from the user#flink.apache.org mailing list.
FLIP-46, which is being tracked as FLINK-13103, is needed in order to fix this. Until then, the StreamingFileSink will remain unable to transition unfinished files to the finished state when a job is stopped. This is described in the documentation as Important Note 2.
Spark streaming provides API for termination awaitTermination(). Is there any similar API available to gracefully shut down flink streaming after some t seconds?
Your driver program (i.e. the main method) in Flink doesn't stay running while the streaming job executes. Your program should define a dataflow, call execute, and then terminate. In Spark, the driver program stays running (AFAIK), and awaitTermination relates to that.
Note that a Flink streaming dataflow continues to execute indefinitely, unless you're using a 'bounded' data source with a finite number of elements. You may also cancel or stop a job, and even take a checkpoint upon stopping to be resumed from later.
I have a C program which communicates with PHP through Unix Sockets. The procedure is as follows: PHP accepts a file upload by the user, then sends a "signal" to C which then dispatches another process (fork) to unzip the file (I know this could be handled by PHP alone, this is just an example; the whole problem is more complex).
The problem is that I don't want to have more than say 4 processes running at the same time. I think this could be solved like this: C, when it gets a new "task" from PHP dumps it on a queue and handles them one-by-one (assuring that there are no more than 4 running) while still listening on the socket.
I'm unsure how to achieve this though, as I cannot do that in the same process (or can I)? I have thought I could have another child process for managing the queue which would be accessible by the parent with the use of shared memory, but that seems overly complicated. Is there any other way around?
Thanks in advance.
If you need to have a separate process for each task handler, then you might consider having five separate processes. The first one is the listener, and handles new incoming tasks, and places it into a queue. Each task handler initially sends a request for work, and also when it is finished processing a task. When the listener receives this request, it delivers the next task on the queue to the task handler, or it places the task handler on a handler queue if the task queue is empty. When the task queue transitions from empty to non-empty, it checks if there is a ready task handler in the handler queue. If so, it takes that task handler out of the queue, and delivers the task from the task queue to the task handler.
The PHP process would put tasks to the listener, while the task handlers would get tasks from the listener. The listener simply waits for put or get requests, and processes them. You can think of the listener as a simple web server, but each of the socket connections to the PHP process and to each task handler can be persistent.
Since the number of sockets is small and persistent, any of the multiplexing calls could work (select, poll, epoll, kqueue, or whatever is best and or available for your system), but it may be easiest to use a separate thread to handle each socket synchronously. The ready task handler queue would then be a semaphore or a condition variable on the task queue. The thread that handles puts from the PHP process would place tasks on the task queue, and up the semaphore. Each thread that handles ready tasks would down the semaphore, then take a task off the task queue. The task queue itself may itself need mutual exclusive protection depending on how it is implemented.
I have two camel applications and their duty is to read files from the same directory, process them and send them to db consumer. To do this, my endpoint are like this:
file:/data/air?preMove=thread&readLock=fileLock &idempotent=true&idempotentRepository=#fileStore&include=AIROUTPUTCDR_(.*).AIR.gz&move=/data/air/success&moveFailed=error
As u can see, application polls file from polldir based on filters, move them under thread dir to read, read the file and move to success folder.
But with this flow, if I kill an application and start it again, the files ,which were being processed, will not be processed because they are under threads folder.
My question is, is there a way to resume reading the files which are just interrupted?
Thanks
No if you do a hard kill on the application while a file was pre moved, then you would neeed manually to move these files from pre move, back into the source folder, so they can be picked up again