Multiple instance writing on same mounted folder using camel-file component - apache-camel

We have requirement to write to single file using multiple instance of camel interface running simultaneously.
The file is on windows shared file system which has been mounted on JBoss server using SMB.
We are using camel file component to write file from each instance as a local file.
Below is the endpoint URI in camel context
file:/fuse/server/location/proc?fileName=abc.csv&fileExist=Append
The file generate has no issues when the write is happening from single instance, but in case of multiple instance it add junk characters to the file at random lines.
We are using JBoss Fuse 6.0.0 and the interface have been written using camel 2.10 version.
How can this be fixed? Is this the issue with SMB mount or the interface need to handle it.

I've had a look at the source code of the relevant camel component (https://github.com/apache/camel/tree/master/camel-core/src/main/java/org/apache/camel/component/file) & there isn't any built in support for concurrent access to a single file from multiple JVMs. Concurrent access from a single JVM is handled though.
I think you have two basic options to address your requirement:
Write some code to support shared access to a single file. The camel file component looks like it was built with extension in mind, or you could just create a standalone component to do this.
As #Namphibian suggests use some queuing system to serialise your writes (though I don't think seda will work, as it doesn't span JVMS.
My solution would be to use ActiveMQ. Each instance of your application would send messages to a single, shared queue. Some other processes would then consume the messages from MQ & write them to disk.
With a single process consuming all MQ messages there would be no concurrent writes to the filesystem.
A more robust solution would to run ActiveMQ in a cluster (possibly with a node in each of your application instances). Look at "JMSXGroupID" to prevent concurrent consumption of messages.

Related

Camel file reading: race condition with 2 active servers

In our ESB project, we have a lot of routes reading files with file2 or ftp protocol for further processing. Important to notice, that the files we read locally (file2 protocol) are mounted network shares via different protocols (NFS, SMB).
Now, we are facing issues with race conditions. Both servers read the file and process it. We have reduced the possibility of that by using the preMove option, but from time to time the duplicate reading still occurs when both servers poll at the same millisecond. According to the documentation, an idempotentRepository together with readLock=idempotent could help, for example with HazelCast.
However, I'm wondering if this is a suitable solution for my issue as I don't really know if it will work in all cases. It is within milliseconds that both servers read the file, so the information that one server has already processed the file need to be available in the HazelCast grid at the point in time when the second server tries to read. Is that possible? What happens if there are minimal latencies (e.g. network related)?
In addition to that, the setting readLock=idempotent is only available for file2 but not for ftp. How to solve that issue there?
Again: The issue is not preventing dublicate files in general, it is solely about preventing the race condition.
AFAIK the idempotent repository should prevent in your case that both consumers read the same file.
The latency between detection of the file and the entry in hazelcast is not relevant because the file consumers do not enter what they read. Instead they both ask the repository for an exclusive read-lock. The first one wins, the second one is denied, so it continues to the next file.
If you want to minimize the potential of conflicts between the consumers you can turn on shuffle=true to randomize the ordering of files to consume.
For the problem with the missing readLock=idempotent on the ftp consumer: you could perhaps build a separate transfer-route with only 1 consumer that downloads the files. Then your file-consumer route can process them idempotent.

How should an apache module communicate with a separate running process mid-request?

I am looking to design a system system as follows:
Apache module to allow/deny requests
Standalone program containing logic to allow/deny requests
How precisely could the Apache module be made to communicate with a standalone program without introducing a huge performance hit (without reading/writing files)?
For example apr_file_open can be used to write files from an Apache module but that would be very slow.

Apache Camel: How do I signal to other processes that I moved/renamed a file?

I am trying to develop a file receipt process using Camel. what I am trying to do seems simple enough:
receive a file
invoke a web service which will look at that file and generate some metadata
move the file to a new location based on that metadata
invoke subsequent process(es) which will act on the file in it's new location
I have tried several different approaches but none seem to work exactly as I would like. My main issues are that since the file is not moved/renamed until the route is completed, I cannot signal to any downstream process that the file is available within that route.
I need to invoke webservices in order to determine the new name and location, once I do that the body is changed and I cannot use a file producer to move the file from within the route.
I would really appreciate hearing any other solutions.
You can signal the processing routes and then have them poll using the doneFile functionality of the file component.
Your first process will copy the files, signal the processing routes, and when it is done copying the file it will write a done file. Once the done file has been written the file consumers in your processing routes will pick up the file you want to process. This guarantees that the file is written before it is processed.
Check out the "Using done files" section of the file component.
http://camel.apache.org/file2.html
Using other components you could have used the OnCompletion DSL-syntax to trigger a post-route message for futher processing.
However, with the file component, this is not really doable, since the move/done thingy happends in parallell with that "OnCompletion" trigger, and you can't be sure that the file is really done.
You might have some luck with the Unit of Work API which can register post route execution logic (this is how the File component fires of the move when the route is done).
However, do you really need this logic?
I see that you might want to send a wakeup call to some file consumer, but do the file really have to be ready that very millisec? Can't you just start to poll for the file and grab it once ready, once you received the trigger message? That's ususually how you do things with file based protocols (or just ignore the trigger and poll once every now and then).

How to write to a file synchronously after fork()?

I am going to implement a server in C as a course project. The server should serve more than one client simultaneously. Description of the project states that fork() should be used to serve more than one client. Each children should write something to a common file. How to handle this synchronously? Is there any mechanism like in Java where only one thread can use a function at the same time?

Runtime information in C daemon

The user, administrators and support staff need detailed runtime and monitoring information from a daemon developed in C.
In my case these information are e.g.
the current system health, like throughput (MB/s), already written data, ...
the current configuration
I would use JMX in the Java world and the procfs (or sysfs) interface for a kernel module. A log file doesn't seem to be the best way.
What is the best way for such a information interface for a C daemon?
I thought about opening a socket and implementing a bare-metal http or xmlrpc server, but that seems to be overkill. What are alternatives?
You can use a signal handler in your daemon that reacts to, say USR1, and dumps information to the screen/log/net. This way, you can just send the process a USR1 signal whenever you need the info.
You could listen on a UNIX-domain socket, and write regularly write the current status (say once a second) to anyone who connects to it. You don't need to implement a protocol like HTTP or XMLRPC - since the communication will be one-way just regularly write a single line of plain text containing the state.
If you are using a relational database anyway, create another table and fill it with the current status as frequent as necessary. If you don't have a relational database, write the status in a file, and implement some rotation scheme to avoid overwriting a file that somebody reads at that very moment.
Write to a file. Use a file locking protocol to force atomic reads and writes. Anything you agree on will work. There's probably a UUCP locking library floating around that you can use. In a previous life I found one for Linux. I've also implemented it from scratch. It's fairly trivial to do that too.
Check out the lockdev(3) library on Linux. It's for devices, but it may work for plain files too.
I like the socket idea best. There's no need to support HTTP or any RPC protocol. You can create a simple application specific protocol that returns requested information. If the server always returns the same info, then handling incoming requests is trivial, though the trivial approach may cause problems down the line if you ever want to expand on the possible queries. The main reason to use a pre-existing protocol is to leverage existing libraries and tools.
Speaking of leveraging, another option is to use SNMP and access the daemon as a managed component. If you need to query/manage the daemon remotely, this option has its advantages, but otherwise can turn out to be greater overkill than an HTTP server.

Resources