Linux rsync archive flag over nfs mount anomaly - archive

I need to rsync a small directory from an NFS in the US to an NFS in India at least one time per day via a mounted directory at the source end. I want to retain all attributes so I'm using the archive flag, but owner is changed to me at the receiving end. I am running the job as my user ID not root, but that should not matter should it?
Regards
-John

Yes it does matter. See here
https://serverfault.com/questions/514118/mapping-uid-and-gid-of-local-user-to-the-mounted-nfs-share
The user id you've mounted the remote host on may be different than the one you're logged in as on the local host.

Related

Camel file reading: race condition with 2 active servers

In our ESB project, we have a lot of routes reading files with file2 or ftp protocol for further processing. Important to notice, that the files we read locally (file2 protocol) are mounted network shares via different protocols (NFS, SMB).
Now, we are facing issues with race conditions. Both servers read the file and process it. We have reduced the possibility of that by using the preMove option, but from time to time the duplicate reading still occurs when both servers poll at the same millisecond. According to the documentation, an idempotentRepository together with readLock=idempotent could help, for example with HazelCast.
However, I'm wondering if this is a suitable solution for my issue as I don't really know if it will work in all cases. It is within milliseconds that both servers read the file, so the information that one server has already processed the file need to be available in the HazelCast grid at the point in time when the second server tries to read. Is that possible? What happens if there are minimal latencies (e.g. network related)?
In addition to that, the setting readLock=idempotent is only available for file2 but not for ftp. How to solve that issue there?
Again: The issue is not preventing dublicate files in general, it is solely about preventing the race condition.
AFAIK the idempotent repository should prevent in your case that both consumers read the same file.
The latency between detection of the file and the entry in hazelcast is not relevant because the file consumers do not enter what they read. Instead they both ask the repository for an exclusive read-lock. The first one wins, the second one is denied, so it continues to the next file.
If you want to minimize the potential of conflicts between the consumers you can turn on shuffle=true to randomize the ordering of files to consume.
For the problem with the missing readLock=idempotent on the ftp consumer: you could perhaps build a separate transfer-route with only 1 consumer that downloads the files. Then your file-consumer route can process them idempotent.

How to get an NFS filehandle?

I am trying to do some testing of several thousand NFSv3 fileserver exports across hundreds of servers. Lots of things can go wrong, from configurations on the server to network connectivity. The most complete test I can do is to actually try to mount it on a client.
I can do that, but actually mounting everything is more than I need, takes state and resources beyond the program's execution, and tends to stress client a bit. I've more than once seen problems that seem to indicate something on the client was unhappy and preventing a mount from happening. (With no changes other than a client reboot, mounts worked again).
I was hoping to instead code something lighter weight that would simply act as a NFS client and see if the NFS MOUNT call successfully returned a filehandle. If it did, my server is running and my client is authorized. But I've not found any simple code to do so.
When I look at the Linux Source, it looks like at least some of the code is involved with it being a linux module, which is confusing.
Is there some user-space code that just requests a NFS filehandle via a mount call that I might be able to strip down? (Or is there any reason that my idea wouldn't work)? This is all AUTH_SYS, so I don't have to get kerberos tickets or anything.
Without knowing more, I will speculate a little based on my knowledge of NFS/Linux file systems.
I assume your client is linux (but the same logic could apply with windows if it had an nfs client present).
It sounds like when you do the mounts you are reaching a point where resources are being consumed to the point that the client cannot mount any more nfs mounts. Makes sense as when you reboot it starts working again, and a reboot will drop the nfs mounts (assuming that you are explicitly/programmatically mounting) thus allowing mounts to occur again. I bet you are just mounting the nfs mounts and are never unmounting them. So I would suggest the following:
mount
access a file or directory in the just mounted nfs filesystem
unmount the nfs file system
well, then just get the rpc protocol definition files for the version of nfs you are running, probably version 3 or 4, run them through the rpcget xdr protocol compiler you will get client functions in c that you can compile that you can make calls to the server with. But they will execute several system system calls, no way to do network communication linux without that happening and it will go throuth the tcp/ip stack (you will probably use udp) in the linux kernel. U can probably find the nfs protocol definition files on the SUN/Oracle website, or you can get find them in the source for a linux distribution -- you will be making application layer calls but the client calls rpc library functions which in turn will call linux system calls which go to the kernel

How to release Linux lock files for unique daemon process and multiple users

I have a deamon of which only one instance should be running at a time. The daemon is part of a larger application. I made this happen this way:
open() /tmp/prog.pid with O_CREAT | O_RDWR, permissions 0666. The permissions actually become 0664, probably because of umask (?)
flock() on the file descriptor returned by open(), with LOCK_EX | LOCK_NB
This is all I had first. My daemon exits on SIGTERM and SIGINT, but it turned out that the lock was not released upon exit. I realized with help of man 1 flock (strangely not in man 2 flock) that manually unlocking might be necessary if "the enclosed command group may have forked a background process which should not be holding the lock". This is exaclty the case since I am working on a daemon, so I now unlock manually at exit.
Now to my problem: there are several users who might be running the daemon.
If user1 is running the daemon, I want user2 to be able to kill it and restart it as themselves.
The locked file /tmp/prog.pid has permissions 0664, owner user1, group user1.
A stop script prog_stop kills all the processes involved in the application (it requires superuser rights, I'm ok with that). It also kills the daemon. When user2 runs prog_stop, the lock is released (I believe), but user2 cannot start its own daemon process, because it is neither owner of the lock file, nor in its group.
Several possible solutions:
make the lock file 0666, writeable to all. Dangerous.
create a group in which users need to be in order to run the application. This requires that all users start the application with this group, probably with help of newgrp. Easy to forget, not easy to enforce that people do this. Possibly set the current group in the scripts used to start the application?
completely delete the lock file in prog_stop. Drawback: I open the file from a C file, where the path string is defined. I need to write (and maintain!) the exact same file name with path in the stop script.
Lock files for daemons must be very common. What is the standard way to deal with this problem?
The standard way for lock files is to turn the daemon into a service and require sudo (or becoming root by other means) to start and stop it.
Now you can give the file a certain group; users in this group can then modify it. They can use newgrp but it's better to add them to the group with usermod --append --groups=foo bar (to add user bar to the group foo; the user keeps her original GID and all other groups they had). After a relog, you can validate this with id bar.
This is all very tedious. When I need something like that, I create a socket. Sockets are killed with the process that created them (so no cleanup necessary). Sockets can also be used to communicate with the running daemon (give me your status, shutdown, maybe even restart, ...).
I'm using a default port number which I compile into the application but I also use an environment variable to override the default.
Just make sure that you create a socket which listens on localhost; otherwise anyone on the Internet might be able talk to it.

How to restrict write access to a Linux directory by process attributes?

We've got a situation where it would be advantageous to limit write access to a logging directory to a specific subset of user processes. These particular processes (say, for example, telnet and the like) have been modified by us to generate a logging record whenever a significant user action takes place (like a remote connection, etc). What we do not want is for the user to manually create these records by copying and editing existing logging records.
syslog comes close but still allows the user to generate spurious records, SELinux seems plausible but has a terrible reputation of being an unmanageable beast.
Any insight is appreciated.
Run a local logging daemon as root. Have it listen on an Unix domain socket (typically /var/run/my-logger.socket or similar).
Write a simple logging library, where event messages are sent to the locally running daemon via the Unix domain socket. With each event, also send the process credentials via an ancillary message. See man 7 unix for details.
When the local logging daemon receives a message, it checks for the ancillary message, and if none, discards the message. The uid and gid of the credentials tell exactly who is running the process that has sent the logging request; these are verified by the kernel itself, so they cannot be spoofed (unless you have root privileges).
Here comes the clever bit: the daemon also checks the PID in the credentials, and based on its value, /proc/PID/exe. It is a symlink to the actual process binary being executed by the process that send the message, something the user cannot fake. To be able to fake a message, they'd have to overwrite the actual binaries with their own, and that should require root privileges.
(There is a possible race condition: a user may craft a special program that does the same, and immediately exec()s a binary they know to be allowed. To avoid that race, you may need to have the daemon respond after checking the credentials, and the logging client send another message (with credentials), so the daemon can verify the credentials are still the same, and the /proc/PID/exe symlink has not changed. I would personally use this to check the message veracity (by the logger asking for confirmation for the event, with a random cookie, and have the requester respond with both the checksum and the cookie whether the event checksum is correct. Including the random cookie should make it impossible to stuff the confirmation in the socket queue before exec().)
With the pid you can do also further checks. For example, you can trace the process parentage to see how the human user has connected by tracking parents till you detect a login via ssh or console. It's a bit tedious, since you'll need to parse /proc/PID/stat or /proc/PID/status files, and nonportable. OSX and BSDs have a sysctl call you can use to find out the parent process ID, so you can make it portable by writing a platform-specific parent_process_of(pid_t pid) function.
This approach will make sure your logging daemon knows exactly 1) which executable the logging request came from, and 2) which user (and how connected, if you do the process tracing) ran the command.
As the local logging daemon is running as root, it can log the events to file(s) in a root-only directory, and/or forward the messages to a remote machine.
Obviously, this is not exactly lightweight, but assuming you have less than a dozen events per second, the logging overhead should be completely neglible.
Generally there's two ways of doing this. One, run these processes as root and write protect the directory (mentioned mainly for historical purposes). Then no one but root can write there. The second, and more secure is to run them as another user (not root) and give that user, but no one else, write access to the log directory.
The approach we went with was to use a setuid binary to allow write access to the logging directory, the binary was executable by all users but would only allow a log record to be written if the parent process path as defined by /proc/$PPID/exe matched the subset of modified binary paths we placed on the system.

Ensure that file state on the client is in sync with NFS server

I'm trying to find proper way to handle stale data on NFS client. Consider following scenario:
Two servers mount same NFS shared storage with number of files
Client application on 1 server deletes some files
Client application on 2 server tries to access deleted files and fails with: Stale NFS file handle (nothing strange, error is expected)
(Also it may be useful to know, that cache mount options are pretty high on both servers for performance reasons).
What I'm trying to understand is:
Is there reliable method to check, that file is present? In the scenario given above lstat on the file returns success and application fails only after trying to move file.
How can I manually sync contents of directory on the client with server?
Some general advise on how to write reliable file management code in case of NFS?
Thanks.
Is there reliable method to check, that file is present? In the scenario given above lstat on the file returns success and application fails only after trying to move file.
That's it normal NFS behavior.
How can I manually sync contents of directory on the client with server?
That is impossible to do manually, since NFS pretends to be a normal POSIX-compliant file system.
I have tried once to code close()/open() in an attempt to somehow mitigate the effects of the NFS client-side caching. In my case I needed to read the info written to the file on other server. But even the reopen trick had close to zero effect. And I can't add fdatasync() to the writing side, since that slows whole application down.
My experience with NFS to date is that nothing you can do. In critical code paths I simply coded to retry the file operations which return ESTALE.
Some general advise on how to write reliable file management code in case of NFS?
Mod me down all you want, but if your customers want reliability then they shouldn't use NFS.
My company for example advertises use of proper distributed file system (I intentionally omit the brand) if customer wants reliability. Our core software is not guaranteed to run on NFS and we do not support such configurations. But in our case we really need the guarantees that as soon as the data are written to FS they become accessible on all other nodes.
Coherency in NFS can be achieved, but at the cost of performance, making NFS barely usable. (Check its mount options.) NFS is caching like crazy to hide the fact that it is a server file system. To make all operations coherent, NFS client would have to go to the NFS server synchronously for every little operation, bypassing the local cache. And that would never be fast.
But since we are talking Linux here, one can advise customers of the software to evaluate available cluster file systems. E.g. RedHat now officially support GFS. I have heard about people using CodaFS, but have no hard info on it.
i have had success with doing ls -l on the directory which contains the file.
You could try the ''noac'' mount option
from man nfs:
In addition to preventing the client
from caching file attributes, the noac
option forces application writes to
become synchronous so that local
changes to a file become visible on
the server immediately. That way,
other clients can quickly detect
recent writes when they check the
file's attributes.
Using the noac option provides
greater cache coherence among NFS
clients accessing the same files, but
it extracts a significant performance
penalty. As such, judicious use of
file locking is encouraged instead.
You could have two mounts, one for critical fast changing data that you need synchronized and another mount for other data.
Also, look into NFS locking and its limitations.
As for general advice:
One way to truncate a file that is concurrently read from multiple hosts is to write the content into a temporary file and then rename that file to the final location.
On the same filesystem this operation should be atomic.

Resources