Currently I am using Git, through the command line, to transfer data files (.csv) from my google cloud VM instance (running linux) to my local machine. However, there is limit of 25MB per file on Github. The files will be 1 GB max.
Are there other straightforward methods to do this? Maybe I can add a couple lines to the code and push the csv to a database. I have not come across a simple way to do so yet.
Are there other straightforward methods to do this?
Yes, for linux you have many options butscp might be most straightforward.
If you can ssh to instance directly, say ssh user#host or (with key) ssh -i key user#host then you can secure copy as well with much the similar commands:
scp -i key user#host:source_path/remote_file . to copy remote file source_path/remote_file to current folder or viceversa
scp -i key local_file user#host:destination_path to copy some local_file from current local folder to remote destination_path
Keep in mind that user has to have proper privileges to access remote path/file in both cases. Archiving file beforehand can help as well especially with .csv files (tar cvzf my_archive.tar.gz my_csv_file.csv for example).
Note: If you suffer from bad network connection that break during such a large transfer or have bunch of files that are not changed but still are part of copy procedure then rsync might be better option, and there are certainly much more options depending on actual requirements.
Related
I'm working on a C project that makes connections to remote servers. Commonly, this involves using some small terminal macros I've added to my makefile to scp an executable to that remote server. While convenient, the only part of this I've not been able to readily streamline is the part where I need to enter the password.
Additionally, in my code, I'm already using system() calls to accomplish some minor terminal commands (like sort). I'd ALSO like to be able to enter a password if necessary here. For instance, if I wanted to build a string in my code to scp a local file to my remote server, it'd be really nice to have my code pull (and use) a password from somewhere so it can actually access that server.
Does anyone a little more experienced with Make know a way to build passwords into a makefile and/or a system() call in C? Bonus points if I can do it without any third-party software/libraries. I'm trying to keep this as self-contained as possible.
Edit: In reading responses, it's looking like the best strategy is to establish a preexisting ssh key relationship with the server to avoid the login process via something more secure. More work up front for less work in the future, by the sound of it, with additional security.
Thanks for the suggestions, all.
The solution is to not use a password. SSH, and thus SCP, has, among many many others, public key authentication, which is described all over the internet. Use that.
Generally, the problem you're trying to solve is called secret management, and the takeaway is that your authentication tokens (passwords, public keys, API keys…) should not be owned by your application software, but by something instructing the authenticating layer. In other words, the way forward really is that you enable SSH to connect on its own without you entering a password by choosing something that happens to not be an interactive authentication method. So, using a password here is less elegant than just using the generally favorable method of using a public key to authenticate with your server.
Passing passwords as command line option is generally a bad idea – that leaks these passwords into things like process listings, potentially log entries and so on. Don't do it.
Running ssh-keygen to create the keys. Then, adding/appending the local system's (e.g) .ssh/id_rsa.pub file to the remote's .ssh/authorized_keys file is the best way to go.
But, I had remote systems to access without passwords but the file was not installed on the remote (needing ssh-keygen to be run on the remote). Or, the remote .ssh/authorized_keys files did not have the public key from my local system in it.
I wanted a one-time automated/unattended script to add it. A chicken-and-the-egg problem.
I found sshpass
It will work like ssh and provide the password (similar to what expect does).
I installed it once on the local system.
Using this, the script would:
run ssh-keygen on the remote [if necessary]
Append the local .ssh/id_rsa.pub public key file to the remote's .ssh/authorized_keys
Copy back the remote's .ssh/id_rsa.pub file to the local system's .ssh/authorized_keys file [if desired]
Then, ssh etc. worked without any passwords.
UPDATE:
ssh_copy_id is your fried, too.
I had forgotten about that. But, when I was doing this, I had more complex requirements.
The aforementioned script would merge/combine all the public keys and update all the authorized_keys files on all the systems. This would be repeated anytime any new system was added to the mix.
you never need to run ssh-keygen on a remote host, especially not to generate an authorized_keys file. –
Marcus Müller
I think that was inferred but not implied as a requirement [particularly in context]. I hope the answer wasn't -1 for that.
Note that (1) ssh-keygen is needed for (3) copy back the public key.
Ironically, one of the tutorial pages for ssh-copy-id says run ssh-keygen first ...
It's been my exerience when setting up certain types of systems/clusters (e.g. a development host/PC and several remote/target/test ones), if one wants to do local-to-remote actions, invariably one wants to do:
remote-to-local actions -- (e.g.) I'm ssh'ed into a remote system and want to do rcp back to the development system.
The remote system needs to do a git clone/pull from [and, sometimes, git push to] the local git server.
remote-to-remote -- copying/streaming data between target systems.
This requires that each system have a private/public key pair and all systems have an authorized_keys file that has the public keys of all the other systems.
When I've not set up the systems that way it usually comes back to haunt me [usually late at night when I'm tired]. So, I just [axiomatically] set it up that way at the outset.
One of the reasons that I developed the script in the first place. Also, since we didn't want to have to maintain a fork of a given system/distro installer for production systems, we would:
Use the stock/standard distro installer CD/USB
Use the script to add the extra/custom config, S/W, drivers, etc.
I've a shell script file which randomly generates a location and copy some files to this randomly generated location.
I also have a different C code that needs to access this randomly generated location to access the copied files.
However, both shell script and C code work independently (in order of shell script and C code). The C code is called by a third application, so it is impossible to pass the location data to C.
How can I securely save this "randomly generated location" data somewhere that C code can access.
I am running these scripts on Mac and would prefer a solution that helps keep these data into memory or does not make file at a common location (like /tmp, /var/tmp etc)
There are various ways to share the information. Personally I don't find saving to a file to be a problem, since you can use the filesystem's access control to limit access, and/or encrypt the file.
However, specifically on macOS there are some other ways, such as User Defaults (accessible from command-line with defaults), and Keychain (accessible from command-line with security).
Saving to user defaults is effectively saving to a file (accessible by that user), so for security (other than through obscurity) you would still need to encrypt the data. Meanwhile Keychain is built for storing things securely, but setting up access to it is more difficult (and you may inadvertently grant your shell interpreter permanent access).
Still, it may be worthwhile to try something like:
security add-generic-password -a myUserName -s myService -w '/foo/bar/baz'
security find-generic-password -g -a myUserName -s myService
Trying to find a good way to copy code between one "deployment" computer and several "target" computers, hopefully in parallel. The idea is that the deployment computer holds a copy of the files as they are supposed to be copied to the target servers. We would like to have copying happen in parallel, as it might involve several tens of target servers.
Our current scheme involves using rsync to synchronize the containing directory where the files reside, in order to keep the target servers up-to-date on the deployment server.
So, the questions are:
What is a good / better way to do this?
What sort of tools are used to do this?
Should this problem be faced from a different angle or perspective that I'm totally missing?
Thanks very much!
Another option is pdsh, a parallel, distributed shell. It's available from EPEL, and allows running remote commands (via ssh) on multiple nodes in parallel. For example:
pdsh -w node10,node11,node12 command
Runs "command" on all three nodes in parallel. It also has a handy hostname expression feature to do the same thing with a bit less typing:
pdsh -w node[10-12] command
It also includes the pdcp command copies files to multiple nodes in parallel. (The pdsh package needs to be installed on all nodes for pdcp to work.)
pdcp -w node[10-12] /local/file /remote/dir/
The local file is copied to the /remote/dir on all three nodes.
We use the lftp command to sync our remote web server to our local backup machine. We wrote a BaSH script to automatically sync all backups on the server to the local box, and we set that script up on a cron to run nightly.
rsync is a fine way of handling this, and I might recommend moving your current protocol into a cron setup if it isn't already.
Unison is also a tool available for setting up two way sync, if you requie that functionality.
Hope this helps!
There is a program called clusterssh that is available on debian based operating systems (but I was able to install it onto RHEL 6.3 using an RPM and resolving other dependencies) that will allow you to open an ssh terminal for multiple machines, with a single input location (this allows you type once onto as many machines as you have terminals open). Then you just have to use a simple scp. I have used this program to move a file from a development workstation to as many as 25 other workstations at the same time, but this option is only really useful if you're trying to accomplish what you stated in the question, that is, copying files from one computer to several others.
This is not an effective syncing mechanism. If you really want it to sync then the above answer would be best.
I have a folder a/ and a remote folder A/.
I now run something like this on a Makefile:
get-music:
rsync -avzru server:/media/10001/music/ /media/Incoming/music/
put-music:
rsync -avzru /media/Incoming/music/ server:/media/10001/music/
sync-music: get-music put-music
when I make sync-music, it first gets all the diffs from server to local and then the opposite, sending all the diffs from local to server.
This works very well only if there are just updates or new files on the future. If there are deletions, it doesn't do anything.
In rsync there is --delete and --delete-after options to help accomplish what I want but thing is, it doesn't work on a 2-way-sync.
If I want to delete server files on a syn, when local files have been deleted, it works, but if, for some reason (explained after) I have some files that aren't in the server but exist locally and they were deleted, I want locally to remove them and not server copied (as it happens).
Thing is I have 3 machines in context:
desktop
notebook
home-server
So, sometimes, server will have files that were deleted with a notebook sync, for example and then, when I run a sync with my desktop (where the deleted server files still exist on) I want these files to be deleted and not to be copied again to the server.
I guess this is only possible with a database and track of operations :P
Any simpler solutions?
Thank you.
Try Unison: http://www.cis.upenn.edu/~bcpierce/unison/
Syntax:
unison dirA/ dirB/
Unison asks what to do when files are different, but you can automate the process by using the following which accepts default (nonconflicting) options:
unison -auto dirA/ dirB/
unison -batch dirA/ dirB/ asks no questions at all, and writes to output how many files were ignored (because they conflicted).
Note: I am no longer using Unison (I use NextCloud, which doesn't address the original use case). However, note that rsync is not designed for bidirectional sync, while unison is. unison may have its bugs (as any other piece of software) and its wrinkles. I am surprised it seems to be actively maintained now (last time I looked I think I thought it looked dead), but I'm not sure what's the state nowadays. I haven't had the need to have a two-way file synchronizer, so there may be better options, though.
Since the original question also involves a desktop and laptop and example involving music files (hence he's probably using a GUI), I'd also mention one of the best bi-directional, multi-platform, free and open source programs to date: FreeFileSync.
It's GUI based, very fast and intuitive, comes with filtering and many other options, including the ability to remote connect, to view and interactively manage "collisions" (in example, files with similar timestamps) and to switch between bidirectional transfer, mirroring and so on.
FreeFileSync can easily sync two computers on the same network and also sync two computers on different and remote networks.
On same network: have FreeFileSync use the local file system on one side and a shared network drive / path on the other. On Windows systems you enable file / disk sharing on one computer and access that share from the other. I use FreeFileSync this way to keep my main development PC source code synced with my 2 laptops.
I have also synced one of these laptops with a Linux server with Samba installed and sharing one of its directories.
Across networks: create a VPN and do the same as above. FreeFileSync will see the remote disk as it was on the local network. Or buy one router that allows you to connect a USB disk to it and share over the internet. I have installed a VPN on a remote Linux server and used it through the OpenVPN Windows client.
You could also try bitpocket: https://github.com/sickill/bitpocket
Try this,
get-music:
rsync -avzru --delete-excluded server:/media/10001/music/ /media/Incoming/music/
put-music:
rsync -avzru --delete-excluded /media/Incoming/music/ server:/media/10001/music/
sync-music: get-music put-music
I just test this and it worked for me. I'm doing a 2-way sync between Windows7 (using cygwin with the rsync package installed) and FreeNAS fileserver (FreeNAS runs on FreeBSD with rsync package pre-installed).
You might use Osync: http://www.netpower.fr/osync , which is rsync based with intelligent deletion propagation. it has also multiple options like resuming a halted execution, soft deletion, and time control.
You could try csync, it is the sync engine under the hood of owncloud.
I'm surprised no one has mentioned Syncthing yet. I have been using it for years to synchronize my phone, my tablet and my two laptops. One time I also used it to send 10 GB of photos to my family ~600 km away, straight from my machine to their machine, and it was incredibly fast (despite the data getting routed through Syncthing's discovery server to work around NAT issues). I also tried OwnCloud/NextCloud at some point but Syncthing has been much more reliable and, also, much faster.
I'm now using SparkleShare https://www.sparkleshare.org/
works on mac, linux and windows.
I'm not sure whether it works with two syncing but for the --delete to work you also need to add the --recursive parameter as well.
Rclone is what you are looking for. Rclone ("rsync for cloud storage") is a command line program to sync files and directories to and from different cloud storage providers including local filesystems. Rclone was previously known as Swiftsync and has been available since 2013.
Is there a tool that creates a diff of a file structure, perhaps based on an MD5 manifest. My goal is to send a package across the wire that contains new/updated files and a list of files to remove. It needs to copy over new/updated files and remove files that have been deleted on the source file structure?
You might try rsync. Depending on your needs, the command might be as simple as this:
rsync -az --del /path/to/master dup-site:/path/to/duplicate
Quoting from rsync's web site:
rsync is an open source utility that
provides fast incremental file
transfer. rsync is freely available
under the GNU General Public License
and is currently being maintained by
Wayne Davison.
Or, if you prefer wikipedia:
rsync is a software application for
Unix systems which synchronizes files
and directories from one location to
another while minimizing data transfer
using delta encoding when appropriate.
An important feature of rsync not
found in most similar
programs/protocols is that the
mirroring takes place with only one
transmission in each direction. rsync
can copy or display directory contents
and copy files, optionally using
compression and recursion.
#vfilby I'm the process of implementing something similar.
I've been using rsync for a while, but it gets funky when deploying to remote server with permission changes that are out of my control. With rsync you can choose to not include permissions, but they still endup being considered for some reason.
I'm now using git diff. This works very well for text files. Diff generates patches, rather then a MANIFEST that you have to include with your files. The nice thing about patches is that there is already an established framework for using and testing these patches before they're applied.
For example, with patch utility that comes standard on any *unix box, you can run the patch in dry-run mode. This will tell you if the patch that you're going to apply is actually going to apply before you run it. This helps you to make sure that the files that you're updating have not changed while you were preparing the patch.
If this is similar to what you're looking for, I can elaborate on my process.