Apache camel running on multiple nodes - apache-camel

I'm running apache camel on multiple nodes and source folder is same for all nodes.
All nodes are not processing files parallel, if node1 is processing files then node2 is waiting to acquire lock of node1 locked file and not picking other files
I want all nodes process files parallel if node1 is processing file then node2 should go for other files except file1

Problem is that all nodes get the same file listing and all nodes want to process the same file.
Try using the shuffle option so that every node gets a randomly sorted file listing.

Related

Moving file to another directory in ABAP

I have got a service running in a specific directory in 5-second-intervals which is picking up an XML file created in that directory sending it for some necessary authorization checks to another client and then requesting a response file.
My issue is that my Z_PROGRAM creating the XML file might take longer than 5 seconds as a result of the file's size. Therefore creating the file in that specific directory is not preferable. I thought about creating a new folder in that directory called "temporary" and creating the file inside that folder, then once I'm done with it, moving it back outside for the service to pick it up.
Is there any way to move files from one directory to another via ABAP code only?
Copying the file manually is not an option since the problem that I have during file creation still persists. I need 2 alternatives, one used for local directories and one for application server directories. Any ideas?
Generally, we create another empty file for completed files after the file creation process ends. Third parties must be firstly checked empty file is there. Example:
data file.csv
data file.ok
If you already completed your integration and it is not easy to make any change with third parties, I prefer using OS level file moving commands. Sample document here. You can use mv for Linux server and move for Windows. If your file is big, you will get same problem with OPEN DATASET concept. We have ARCHIVFILE_SERVER_TO_SERVER FM for moving files but it is also using OPEN DATASET.
there is no explicit move command in ABAP code that move or copy files between directories in application server.
there is two tips can be helpfull in your case. if you are writing big file you may seperate the logic behind collecting data and writing file. I would say don't execute transfer data inside your loop. instead collect you data into an internal table once you're done, loop over this internal table and write direclty strings without any delay you should be able to write a big files upp to several hundred of MB under 1 sec.
next tips is to not modify your program, or if you are using function modules to construct xml is, write to a temp directory after finishing, then have another program open you file on source directory by read dataset and directly write data to the new directory again just strings without interruptions.
you should be ok if you just write strings.
You can simply use System Call Commands to perform actions in Application Directory.
CALL 'SYSTEM'
ID 'COMMAND'
FIELD 'mv /usr/sap/temporary/File.xml
/usr/sap/final/file.xml'

Batch Programming - Not deleting partial files being written

I am trying to develop a batch script which copies a list of files from the a source directory to another directory and does some functions. Once the copy has been done, the files needs to be deleted from the source directory.
In the source directory, we are receiving files on real time basis 24x7 from an external source.
How can I make sure, the copy or the delete from the script does not impact any files which are currently being sent (incomplete files which are in relay at that moment) from the external source in the source directory, whilst the script is doing its own job?
The script needs to be ran every 5 minutes.
Due to some challenges, we cannot xxcopy for our functionality.
Please can you advise.
Instead of COPY& DELETE use MOVE, so the batch fragment would look like this:
FOR %%F IN (C:\SOURCE\*.*) DO MOVE %%F C:\TARGET\
If file in incomplete (other process uses it), then MOVE on that file will fail. Next time the script is run, it will attempt to move that file again.

Too many open files error when reading from a directory

I'm using readTextFile(/path/to/dir) in order to read batches of files, do some manipulation on the lines and save them to cassandra.
Everything seemed to work just fine, until I reached more than 170 files in the directory (files are deleted after a successful run).
Now I'm receiving "IOException: Too many open files", and a quick look at lsof I see thousands of file descriptors opening once I run my code.
Almost all of the file descriptors are "Sockets".
Testing on a smaller scale with only 10 files resulted in more than 4000 file descriptors opening, once the script has finished, all the file descriptors are closed and back to normal.
Is it a normal behavior of flink? should I increase the ulimit?
Some notes:
The environment is Tomcat 7 with java 8, flink 1.1.2 using DataSet API.
Flink job is scheduled with quartz.
All those 170+ files sum to about ~5MB.
Problem solved.
After narrowing the code, I found out that using "Unirest.setTimeouts" inside a highly parallel "map()" step caused too many threads allocations which in turn consumed all my file descriptors.

Apache Camel GenericFileOperationFailedException: 'Cannot rename file' locks exchange

We have an integration system based on Camel v2.16.1 that runs on a Jboss v6 Linux platform. There are multiple interfaces running simultaneously each with a different polling rate.
We are intermittently experiencing 'Cannot rename file' issue with Camel failing to backup to the 'done' folder successfully processed and transmitted files from the FTP source. Restarting the camel application fixes the issue.
Basically, at regular intervals triggered by a quartz scheduler, the route:
picks up files from a source via FTP,
processes them, smooks + xsl transformations
delivers the generated flat file to an endpoint via FTP.
If multiple files are read from the source directory, then all the files are appended together in a temporary file before being processed.
The Camel FTP configuration uses the following URL:
ftp://xxxx/export?antInclude=dsciord_*.dat&inProgressRepository=#warehouseIntegrationIdempotentRepository&preMove=in_progress_bpo/$simple{date:now:yyyyMMddHHmm}/$simple{file:name}&move=done&consumer.bridgeErrorHandler=true
read files dsciord_*.dat from /export directory
use custom inprogressRepository to store the read filename into a local db (this was done to prevent contention issue with a second cluster node, however, currently only a single node is live. This option is unnecessary and can be removed speeding up the process).
move files to an in_progress_bpo/201609061522 directory, where the subdirectory is created based on the date_timestamp.
move them to the in_progress_bpo/201609061522/done subdirectory once successfully processed.
In vast majority of cases the route works with no issues, however, sometimes the file(s) cannot be moved to the done folder (see error below). Even in this case, the route can sometimes continue successfully at the next polling cycle, however, in other cases the route enters a state when even if the quartz scheduler triggers the poll, the route fails to detect any files in the source /export directory even when there ARE files there.
org.apache.camel.component.file.GenericFileOperationFailedException: Cannot rename file: RemoteFile[in_progress_bpo/201609060502/dsciord_3605752.dat] to: RemoteFile[in_progress_bpo/201609060502/done/dsciord_3605752.dat]
Notes: We are using
a single instance of a ConsumerTemplate to handle our interfaces.
a custom inprogressRepository to store the file names read.
Obviously, there must be a system locking the source files and this is causing the Camel route to stop processing further files.
Any ideas/suggestions on debugging/resolving this issue would be greatly appreciated. The issues that I read through the camel-users forum seem to deal with Windows-related deployments, sometimes Smooks failing to close the input stream. I've check and we don't use the
org.milyn.templating.xslt.XslTemplateProcessor#bypass method where Smooks fails to close the underlying input stream.
Finally I have been able to reproduce/identify the issue.
Given that we are using a relative path to move the processed files into once successfully ftp-ed to the destination servers:
../../../u/4gl_upload/warehouse_integration_2/trs-server/export/in_progress_bpo/201609081030/done
However, for some reason instead of traversing the via correct path to move the processed files the camel consumer creates a new subdirectory tree starting from the current working directory and this could be quite long as follows. Hence the problem. It doesn’t know where it is and it doesn’t reset itself.
/u/4gl_upload/warehouse_integration_2/trs-server/u/4gl_upload/warehouse_integration_2/trs-server/export/in_progress_bpo/201609081030
This was reproduced with the option stepwise=false, which means it traverses the subdirectories in a single step instead of step wise.
Still don’t know what best solution is.

WSO2 ESB processing a large file, stopping and restarting?

Is it possible in the WSO2 ESB to process a large file (with each line representing a single record/message), and stop or pause the processing within the file, then restart where it left off? (using the Smooks mediator and/or Iterate mediator, so any other mechanism)
It appears that if you are processing a large file (say with 10K entries) you cannot stop or pause the processing (or say the ESB is brought down in the middle of processing the file), and then restart where it was stopped. Upon restarting either the whole file has to be reprocessed, or the file is dumped to the error folder and skipped. Is this correct?
Thanks for any help on this.
AFAIK this cannot be done in a straight forward manner. However you could probably split the large file in to smaller files using Smooks mediator (message splitting) [1], and use VFS processing on it [2]. Each file on the VFS location that is processed will either be deleted or moved, so whenever the process restarts it will not process the already processed messages/files.
[1] - https://github.com/smooks/smooks/tree/v1.5.1/smooks-examples/file-router/
[2] - https://docs.wso2.com/pages/viewpage.action?pageId=33136056

Resources