apache camel - seda endpoint multicast

apache camel - seda endpoint multicast - apache-camel

from("seda:start)
.multicast(new GroupedBodyAggregationStrategy())
.parallelProcessing()
.to("seda:process1", "seda:process2")
.end()
.to("seda:join");
The plan is for process1 and process2 to run in parallel and for its output to be available on join endpoint.
Above is working fine if on "direct", but on "seda" the behavior is that the "join" is getting invoked immediately even though process1 and process2 is still in progress.
I have tried adding the following options to process1 and process2:
to("seda:process1?waitForTaskToComplete=Always", "seda:process2?waitForTaskToComplete=Always")
It is now behaving okay (I can retrieve process1 and process2 outputs on join endpoint) but one whole chain of request is getting queued and not running in parallel. Example, I have sent two messages in parallel on "start" endpoint, one whole chain is getting triggered only after having the other the other full chain is completed.
Any ideas?

You can make the start and join component use seda. whereas the process1 and process2 uses multicast with paralellProcessing which will take care of running these process in parallel.
And for the seda:start use something like,
from("seda:start?concurrentConsumers=10") this will start accepting 10 requests in parallel. For more information, please take a look at http://camel.apache.org/seda.html

Related

Sorting issue with Apache camel

We have an application were we use the file component of apache camel. We implemented our own comparator to which we refer using #sorter. The file component reads files from four different folders and sorts them.
We have maxmessagesPerPoll set to 0 and eagerMaxMessagesPerPoll set to false.
The following described issue happens when we have somewhere between 1k to 5k files in the four folders combined.
Camel has apparently two threads Thread #1 and Thread#2, usually Thread #1 runs the sorting code and Thread#2 processes the files. But when there are between 1k to 5k files or more even thread#1 starts processing which causes files to go out of order. See the logs in Listing 1 to see an example of how thread#1 and thread#2 are both processing the file.
FYI initial sorting for all 5000 files was done by thread #1, but during processing at times thread #1 contributes to processing the file too which results in files going out of order. This does not happen if the number of files are low like 200 eg. then only thread #2 processes the files.
How can I keep the processing confined to just thread#2, is there a property that can be set?
Listing 1
20200829 13:45:00.516 - [Camel (xyz) **thread #1** - file:///export/data/abc/xyz/zyz] INFO a.b.c.Transformer - Processing started for file /export/data/abc/xyz/zyz//f/g/h../run/file1.xml
20200829 13:45:00.576 - [Camel (xyz) **thread #1** - file:///export/data/abc/xyz/zyz] INFO a.b.c.Transformer - Processing completed for file /export/data/abc/xyz/zyz//f/g/h../run/file1.xml in 0 seconds
20200829 15:15:14.910 - [Camel (xyz) **thread #2** - Threads] INFO a.b.c.Transformer - Processing started for file /export/data/abc/xyz/zyz/g/f/h../run/file2_XML
20200829 15:15:15.007 - [Camel (xyz) **thread #2** - Threads] INFO a.b.c.Transformer - Processing completed for file /export/data/abc/xyz/zyz/g/f/h../run/file2_XML in 0 seconds
I tried the following suggestion -
Use maxMessagersPerPoll=1 and set eagerMaxMessagesPerPoll=false
as found here http://www.davsclaus.com/2008/12/camel-and-file-sorting.html
but that presents its own problems. Say there are 3000 files, it processes one file and then resorts the remaining files, which slows the whole process considerably since sorting takes like more then 45 minutes.

The secret to this was to use the synchronous query option as described in File2 component documentation of apache camel. There will always be two threads. Once you use synchronous only thread #2 processes the files and not thread #1.
Additionally leave maxMessagesPerPoll to 0 and eagerMaxMessagesPerPoll to true.
I would have to say that documentation of camel is poor and not without its grammatical mistakes.

How do I introduce a Delay in Camel to prevent file locking before the files are copied in?

I am using Camel, ActiveMq, and JMS to poll a directory and process any files it finds. The problem with larger files is they start processing before being fully copied into the directory. I has assumed (yes, I know what assume gets you) that the file system would prevent it -- but that doesn't seem to be true. The examples in the Camel docs do not seem to be working. Here is my code from within the configure method of the RouteBuilder:
from("file://" + env.getProperty("integration.directory.scan.add.eng.jobslist")+"?consumer.initialDelay=100000")
.doTry()
.setProperty("servicePath").constant("/job")
.setProperty("serviceMethod").constant("POST")
.process("engImportJobsFromFileProcessor")
.doCatch(Exception.class)
.to("log:-- Add Job(s) Error -------------------------")
.choice()
.when(constant(env.getProperty("eng.mail.enabled.flag.add.jobslist.yn")).isEqualToIgnoreCase("Y"))
.setHeader("subject", constant(env.getProperty("integration.mq.topic.add.eng.jobslist.error.email.subject")))
.to("direct://email.eng")
.otherwise()
.to("log:-----------------------------------------")
.to("log:-- Email for JOBSLIST IS DISABLED")
.to("log:-----------------------------------------")
.end()
.end()
.log("Finished loading jobs from file ")
;
As you can see, I tried to set an 'initialDelay', I have also tried 'delay' and 'readLock=changed' and nothing made a difference. As soon as the file hits the directory, Camel starts processing. All I am after is a nice simple delay before the file is polled. Any ideas?

Use option readLockMinAge.
From File2 component documentation:
This option allows you to specify a minimum age a file must be before attempting to acquire the read lock. For example, use readLockMinAge=300s to require that the file is at least 5 minutes old.
For 100s delay could URI look like this:
from("file://" + env.getProperty("integration.directory.scan.add.eng.jobslist")+"?readLock=changed&readLockMinAge=100s")

Use combination of the options "readLock=changed" , "readLockCheckInterval=1000" and readLockMinAge=20s
(1000 is in milliseconds and the default value, should be changed to higher value is writes are slower i.e the file size changes after a long time, this may happen on certain filesystems, that the file size changes not very frequently while transfer is in process)
The file component documentation # http://camel.apache.org/file2.html says
for readlock=changed
changed is using file length/modification timestamp to detect whether the file is currently being copied or not. Will at least use 1 sec. to determine this, so this option cannot consume files as fast as the others, but can be more reliable as the JDK IO API cannot always determine whether a file is currently being used by another process. The option readLockCheckInterval can be used to set the check frequency.
for readLockCheckInterval=1000
Camel 2.6: Interval in milliseconds for the read-lock, if supported by the read lock. This interval is used for sleeping between attempts to acquire the read lock. For example when using the changed read lock, you can set a higher interval period to cater for slow writes. The default of 1 sec. may be too fast if the producer is very slow writing the file.
for readLockMinAge=20s
Camel 2.15: This option applies only to readLock=change. This option allows you to specify a minimum age a file must be before attempting to acquire the read lock. For example, use readLockMinAge=300s to require that the file is at least 5 minutes old. This can speedup the poll when the file is old enough as it will acquire the read lock immediately.
So in the end your endpoint should look something like
from("file://" + env.getProperty("integration.directory.scan.add.eng.jobslist")+"?consumer.initialDelay=100000&readLock=changed&readLockCheckInterval=1000&readLockMinAge=20s")

OK, turned out to be a combination of things. First off I test inside of IntelliJ and also outside for several reasons -- one is a security issue with using email within IDEA. Tomcat, outside of IntelliJ was picking up a classes folder in the webapps/ROOT directory, which was overwriting my changes to the uri options. That's what was driving me nuts. That ROOT folder had been there from a deployment error from several months ago. But it wasn't being picked up by IntelliJ even though I was using the same Tomcat instance. That's why it appear that my changes were being ignored.

How to add log message before camel-ftp starts?

How to add log message before ftp starts ?
to following route:
from("ftp://...idempotentKey=..&idempotentRepository=#MyRepo&delay=..")
.to("file://folder/output");
Log message should contain that ftp started.
Log message should contain filter result whether file processed before or not.
This messages have logLevel=INFO
pollEnrich is not a solution.

The way you wrote it, your route is starting as soon execution is hitting your "from" instruction, then you can log the beginning with a simple java log instruction just before.
Additionally, you can also delay, change startup order, or remove automatic start, documented here.
To your second question, about logging your ftp client activity, you can do it that way:
from("ftp://...idempotentKey=..&idempotentRepository=#MyRepo&delay=..")
.log("Processing ${file:name}")
.to("file://folder/output");

How to disable timeout in LLDB?

In LLDB console, my process is stopped. I run thread step-in and eventually get:
Command timed out
How do I extend or disable this timeout?
In my case, this timeout is expected it because the program requires external interaction before going to the next line.

thread step-in has no timeout. That wouldn't make any sense, as your last comment demonstrates.
The print command can take a timeout, but by default does not. If you run po the object description printing part of that command is run with a timeout. And if you have any code-running variable formatters, they are also run with a timeout. lldb has removed most of the built-in code-running formatters, though there a few of them still around and they could also be responsible for the timeout message. But other than printing, there aren't really that many things lldb does with a timeout...
Anyway, what you are probably seeing is that after the previous stop happened some code was being run to present locals or something similar and that command was what timed out.
If you can get this to happen reliably, then please file a bug with http://bugreporter.apple.com.

How to detect pending system shutdown on Linux?

I am working on an application where I need to detect a system shutdown.
However, I have not found any reliable way get a notification on this event.
I know that on shutdown, my app will receive a SIGTERM signal followed by a SIGKILL. I want to know if there is any way to query if a SIGTERM is part of a shutdown sequence?
Does any one know if there is a way to query that programmatically (C API)?
As far as I know, the system does not provide any other method to query for an impending shutdown. If it does, that would solve my problem as well. I have been trying out runlevels as well, but change in runlevels seem to be instantaneous and without any prior warnings.

Maybe a little bit late. Yes, you can determine if a SIGTERM is in a shutting down process by invoking the runlevel command. Example:
#!/bin/bash
trap "runlevel >$HOME/run-level; exit 1" term
read line
echo "Input: $line"
save it as, say, term.sh and run it. By executing killall term.sh, you should able to see and investigate the run-level file in your home directory. By executing any of the following:
sudo reboot
sudo halt -p
sudo shutdown -P
and compare the difference in the file. Then you should have the idea on how to do it.

There is no way to determine if a SIGTERM is a part of a shutdown sequence. To detect a shutdown sequence you can either use use rc.d scripts like ereOn and Eric Sepanson suggested or use mechanisms like DBus.
However, from a design point of view it makes no sense to ignore SIGTERM even if it is not part of a shutdown. SIGTERM's primary purpose is to politely ask apps to exit cleanly and it is not likely that someone with enough privileges will issue a SIGTERM if he/she does not want the app to exit.

From man shutdown:
If the time argument is used, 5 minutes before the system goes down
the /etc/nologin file is created to ensure that further logins shall
not be allowed.
So you can test existence of /etc/nologin. It is not optimal, but probably best you can get.

Its a little bit of a hack but if the server is running systemd if you can run
/bin/systemctl list-jobs shutdown.target
... it will report ...
JOB UNIT TYPE STATE
755 shutdown.target start waiting <---- existence means shutting down
1 jobs listed.
... if the server is shutting down or rebooting ( hint: there's a reboot.target if you want to look specifically for that )
You will get No jobs running. if its not being shutdown.
You have to parse the output which is a bit messy as the systemctl doesnt return a different exit code for the two results. But it does seem reasonably reliable. You will need to watch out for a format change in the messages if you update the system however.

Making your application responding differently to some SIGTERM signals than others seems opaque and potentially confusing. It's arguable that you should always respond the same way to a given signal. Adding unusual conditions makes it harder to understand and test application behavior.
Adding an rc script that handles shutdown (by sending a special signal) is a completely standard way to handle such a problem; if this script is installed as part of a standard package (make install or rpm/deb packaging) there should be no worries about control of user machines.

I think I got it.
Source =
https://github.com/mozilla-b2g/busybox/blob/master/miscutils/runlevel.c
I copy part of the code here, just in case the reference disappears.
#include "libbb.h"
...
struct utmp *ut;
char prev;
if (argv[1]) utmpname(argv[1]);
setutent();
while ((ut = getutent()) != NULL) {
if (ut->ut_type == RUN_LVL) {
prev = ut->ut_pid / 256;
if (prev == 0) prev = 'N';
printf("Runlevel: prev=%c current=%c\n", prev, ut->ut_pid % 256);
endutent();
return 0;
}
}
puts("unknown");

see man systemctl, you can determine if the system is shutting down like this:
if [ "`systemctl is-system-running`" = "stopping" ]; then
# Do what you need
fi
this is in bash, but you can do it with 'system' in C

The practical answer to do what you originally wanted is that you check for the shutdown process (e.g ps aux | grep "shutdown -h" ) and then, if you want to be sure you check it's command line arguments and time it was started (e.g. "shutdown -h +240" started at 14:51 will shutdown at 18:51).
In the general case there is from the point of view of the entire system there is no way to do this. There are many different ways a "shutdown" can happen. For example someone can decide to pull the plug in order to hard stop a program that they now has bad/dangerous behaviour at shutdown time or a UPS could first send a SIGHUP and then simply fail. Since such a shutdown can happen suddenly and with no warning anywhere in a system there is no way to be sure that it's okay to keep running after a SIGHUP.
If a process receives SIGHUP you should basically assume that something nastier will follow soon. If you want to do something special and partially ignore SIGHUP then a) you need to coordinate that with whatever program will do the shutdown and b) you need to be ready that if some other system does the shutdown and kills you dead soon after a SIGHUP your software and data will survive. Write out any data you have and only continue writing to append-only files with safe atomic updates.
For your case I'm almost sure your current solution (treat all SIGHUPs as a shutdown) is the correct way to go. If you want to improve things, you should probably add a feature to the shutdown program which does a notify via DBUS or something similar.

When the system shuts down, the rc.d scripts are called.
Maybe you can add a script there that sends some special signal to your program.
However, I doubt you can stop the system shutdown that way.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight