How to add log message before camel-ftp starts? - apache-camel

How to add log message before ftp starts ?
to following route:
from("ftp://...idempotentKey=..&idempotentRepository=#MyRepo&delay=..")
.to("file://folder/output");
Log message should contain that ftp started.
Log message should contain filter result whether file processed before or not.
This messages have logLevel=INFO
pollEnrich is not a solution.

The way you wrote it, your route is starting as soon execution is hitting your "from" instruction, then you can log the beginning with a simple java log instruction just before.
Additionally, you can also delay, change startup order, or remove automatic start, documented here.
To your second question, about logging your ftp client activity, you can do it that way:
from("ftp://...idempotentKey=..&idempotentRepository=#MyRepo&delay=..")
.log("Processing ${file:name}")
.to("file://folder/output");

Related

Using camel-smb SMB picks up (large) files while still be written to

When trying to create cyclic moving of files encountered strange behavior with readLock. Create a large file (some 100Mb's) and transfer it using SMB from out to in folder.
FROM:
smb2://smbuser:****#localhost:4455/user/out?antInclude=FILENAME*&consumer.bridgeErrorHandler=true&delay=10000&inProgressRepository=%23inProgressRepository&readLock=changed&readLockMinLength=1&readLockCheckInterval=1000&readLockTimeout=5000&streamDownload=true&username=smbuser&delete=true
TO:
smb2://smbuser:****#localhost:4455/user/in?username=smbuser
Create another flow to move the file back from IN to OUT folder. After some transfers the file will be picked up while still being written to by another route and a transfer will be done with a much smaller file, resulting in a partial file at the destination.
FROM:
smb2://smbuser:****#localhost:4455/user/in?antInclude=FILENAME*&delete=true&readLock=changed&readLockMinLength=1&readLockCheckInterval=1000&readLockTimeout=5000&streamDownload=false&delay=10000
TO:
smb2://smbuser:****#localhost:4455/user/out
Question is: why my readLock is not working properly (p.s. streamDownload is required)?
UPDATE: turns out this only happens on windows samba share, and with streamDownload=true. So, something with stream chunking. Any advice welcome.
The solution requires to prevent polling strategy from automatically picking up a file, and become aware of readLock (in progress) on the other side. So I lowered delay to 5 seconds and in FROM part, on both sides, I added readLockMinAge to 5s which will inspect file modification time.
Since streaming goes for every second this is enough time to prevent read lock.
An explanation of why the previously mentioned situation happens:
When a route prepares to pick-up from out folder, a large file (1GB) is in progress chunk by chunk to in folder. At the end of the streaming file is marked for
removal by camel-smbj and file receive status STATUS_DELETE_PENDING.
Now another part of this process starts to send a newly arrived file to the out folder and finds out that this file already exists. Because of the default fileExists=Override strategy
It tries to delete (afterward store) an existing file (which is still not deleted from the previous step) and receives an exception which causes some InputStream
chunks to be lost.

Camel File Consumer - leave file after processing but accept files with same name

So this is the situation:
I have a workflow, that waits for files in a folder, processes them and then sends them to another system.
For different reasons we use an ActiveMQ Broker between "sub-processes" in the workflow, where each route alters the message in some way before it is sent in the last step. Each "sub-processes" only reads and writes to/from the ActiveMQ, except the first and last route.
It is also part of the workflow, that there is a route after sending the message, that takes care of initial file, moving or deleting it. Only this route knows that to do with the file.
This means, that the file has to stay in the folder after the consumer-route has finished, because the meta-data is just written into the ActiveMQ, but the actual workflow is not done yet.
It got this to work using the noop=true parameter on the file consumer.
The problem with this is, that after the "After Sending Route" deletes (or moves) the file, the file consumer will not react to new files with the same name again until I restart the route.
It is clear, that this is the expected and correct behavior, because its the point of the noop parameter to ignore a file that was consumed before, but this doesn´t help me.
The question is now how I get the file consumer to only process a file once as long as it is present in the folder, but "forget" about it as soon as some other process (in this case a different route) removes the file.
As an alternative I could let the file component move the file into a temp folder, from where it gets processed later and leave the cosuming folder empty, but this introduces new problems, that I'd like to avoid (e.g. moving a file with the same name into the folder, as long as the first one is not yet processed completely)
I'd love to hear some ideas on how to handle that case.
Greets Chris
You need to tell Camel to not only use the filename for idempotency checking.
In a similar situation, where I wanted to pick up changes to a file that was otherwise no-oped, I have the option
idempotentKey=${file:name}-${file:modified}
in my url, which ensures if you change the file, or a new file is created, it treats that as a different file and processes it.
Do be careful to check how many files you might be processing because the idempotent buffer is limited by default (to 1000 records I think), so if you were potentially processing more than 1000 files at a time, it might "forget" it's already processed file 1, when file 1001 arrives, and try and reprocess file 1 again.

How do I introduce a Delay in Camel to prevent file locking before the files are copied in?

I am using Camel, ActiveMq, and JMS to poll a directory and process any files it finds. The problem with larger files is they start processing before being fully copied into the directory. I has assumed (yes, I know what assume gets you) that the file system would prevent it -- but that doesn't seem to be true. The examples in the Camel docs do not seem to be working. Here is my code from within the configure method of the RouteBuilder:
from("file://" + env.getProperty("integration.directory.scan.add.eng.jobslist")+"?consumer.initialDelay=100000")
.doTry()
.setProperty("servicePath").constant("/job")
.setProperty("serviceMethod").constant("POST")
.process("engImportJobsFromFileProcessor")
.doCatch(Exception.class)
.to("log:-- Add Job(s) Error -------------------------")
.choice()
.when(constant(env.getProperty("eng.mail.enabled.flag.add.jobslist.yn")).isEqualToIgnoreCase("Y"))
.setHeader("subject", constant(env.getProperty("integration.mq.topic.add.eng.jobslist.error.email.subject")))
.to("direct://email.eng")
.otherwise()
.to("log:-----------------------------------------")
.to("log:-- Email for JOBSLIST IS DISABLED")
.to("log:-----------------------------------------")
.end()
.end()
.log("Finished loading jobs from file ")
;
As you can see, I tried to set an 'initialDelay', I have also tried 'delay' and 'readLock=changed' and nothing made a difference. As soon as the file hits the directory, Camel starts processing. All I am after is a nice simple delay before the file is polled. Any ideas?
Use option readLockMinAge.
From File2 component documentation:
This option allows you to specify a minimum age a file must be before attempting to acquire the read lock. For example, use readLockMinAge=300s to require that the file is at least 5 minutes old.
For 100s delay could URI look like this:
from("file://" + env.getProperty("integration.directory.scan.add.eng.jobslist")+"?readLock=changed&readLockMinAge=100s")
Use combination of the options "readLock=changed" , "readLockCheckInterval=1000" and readLockMinAge=20s
(1000 is in milliseconds and the default value, should be changed to higher value is writes are slower i.e the file size changes after a long time, this may happen on certain filesystems, that the file size changes not very frequently while transfer is in process)
The file component documentation # http://camel.apache.org/file2.html says
for readlock=changed
changed is using file length/modification timestamp to detect whether the file is currently being copied or not. Will at least use 1 sec. to determine this, so this option cannot consume files as fast as the others, but can be more reliable as the JDK IO API cannot always determine whether a file is currently being used by another process. The option readLockCheckInterval can be used to set the check frequency.
for readLockCheckInterval=1000
Camel 2.6: Interval in milliseconds for the read-lock, if supported by the read lock. This interval is used for sleeping between attempts to acquire the read lock. For example when using the changed read lock, you can set a higher interval period to cater for slow writes. The default of 1 sec. may be too fast if the producer is very slow writing the file.
for readLockMinAge=20s
Camel 2.15: This option applies only to readLock=change. This option allows you to specify a minimum age a file must be before attempting to acquire the read lock. For example, use readLockMinAge=300s to require that the file is at least 5 minutes old. This can speedup the poll when the file is old enough as it will acquire the read lock immediately.
So in the end your endpoint should look something like
from("file://" + env.getProperty("integration.directory.scan.add.eng.jobslist")+"?consumer.initialDelay=100000&readLock=changed&readLockCheckInterval=1000&readLockMinAge=20s")
OK, turned out to be a combination of things. First off I test inside of IntelliJ and also outside for several reasons -- one is a security issue with using email within IDEA. Tomcat, outside of IntelliJ was picking up a classes folder in the webapps/ROOT directory, which was overwriting my changes to the uri options. That's what was driving me nuts. That ROOT folder had been there from a deployment error from several months ago. But it wasn't being picked up by IntelliJ even though I was using the same Tomcat instance. That's why it appear that my changes were being ignored.

ProFTPD Extended Log - Use a subset of command classes instead of whole command class

I am building a log parser for ProFTPD and have a question regarding the ExtendedLog config directive.
Official ProFTPD documentation has the following ExtendedLog spec:
ExtendedLog [ filename [[command-classes] format-nickname]]
There are a couple of valid command-classes, but they are mostly consisted of groups of commands. For me, this is a problem because if a user uploads large file and if there are many users and many uploads, a WRITE command in extended log occurs for portions of the actual upload, meaning if a file is large, for that file WRITE occurs many times. This may fill up the log space fairly easily for large uploads. In comparison to this, STOR command can be visible only at the end of the actual file upload.
I can't explicitly find WRITE as one of the commands in the write command class but I was wondering if there is a way to omit this specific WRITE command from log as I'm only interested in a portion of commands from the write command class. The commands that I'm particularly and only interested in logging are STOR, DELE and RMD.
Many thanks.
At the end I did not found any flags in ProFTPD that could handle this but rather implemented log rotation.
The log rotation restarts ProFTPD and sends interrupt to the log parser. Log parser then detects the interrupt, reads the current log file and then stops processing. Log rotate program then empties out the original log file.

Why doesn't inotify update?

I'm writing an inotify watcher in C for a Minecraft server. Basically, it watches server.log, gets the latest line, parses it, and if it matches a regex; performs some actions.
The program works fine normally through "echo string matching the regex >> server.log", it parses and does what it should. However, when the string is written to the file automatically via Minecraft server, it doesn't work until I shut down the server or (sometimes) log out.
I would post code, but I'm wondering if it doesn't have something to do with ext4 flushing data to disk or something along those lines; a filesystem problem. It would be odd if that were the case though, because "tail -f server.log" updates whenever the file does.
Solved my own problem. It turned out the server was writing to the log file faster than the watcher could read from it; so the watcher was getting out of sync.
I fixed it by adding a check after it processes the event saying "if the number of lines currently in the log file is more than the recorded length of the log, reprocess the file until the two are equal."
Thanks for your help!
Presumably that is because you are watching for IN_CLOSE events, which may not occur until the server shuts down (and closes the log file handle). See man inotify(7) for valid mask parameters for the inotify_add_watch() call. I expect you'll want to use IN_WRITE.
Your theory is more than likely correct, the log file is being buffered by the OS, and the log writer has no flushing of that buffer, so everything will remain in the buffer till the file is closed or the buffer is full. A fast way to test is to start up the log to the point where you know it would have written events to the log, then forcibly close it so it cannot close the handle, if the log is empty is definitly the buffer. If you can get hold of the file handle/descriptor, you can use setbuf to remove buffering, at the cost of performance.

Resources