Download files from multiple links in google cloud storage - file

I have 1000 files in google cloud storage but these files in multiple directories so, how can I download them at the same time
I put all the links in excel file and use this command
cat C:/Users/tm/files.xlsx | C:/Users/tm/AppData/Local/Google/Cloud_SDK/google-cloud-sdk/bin/gsutil.cmd -m cp -I C:/Users/tm/Desktop/files
then I got this massege
stat: embedded null character in path
CommandException: 1 file/object could not be transferred.
Thanks in advance

gsutil -m cp -R gs://your-bucket-name/path/to/directory/ C:/Users/tma/Desktop/files

Related

How to completely download Anaconda Cloud bz2 files and dependencies for offline package installation? [duplicate]

I want to create a Python environment with the data science libraries NumPy, Pandas, Pytorch, and Hugging Face transformers. I use miniconda to create the environment and download and install the libraries. There is a flag in conda install, --download-only to download the required packages without installing them and install them afterwards from a local directory. Even when conda just downloads the packages without installing them, it also extracts them.
Is it possible to download the packages without extracting them and extract them afterwards before installation?
There is no simple command in the CLI to prevent the extraction step. The extraction is regarded as part of the FETCH operation to populate the package cache before running the LINK operation to transfer the package to the specified environment.
The alternative would be to do something manually. Naively, one could search Anaconda Cloud and manually download, however, it would probably be better to go through the solver to ensure package compatibility. All the info for operations to be run can be viewed by including the --json flag. This could be filtered to just the tarball URLs and then downloaded directly. Here's a script along these lines (assuming Linux/Unix):
File: conda-download.sh
#!/bin/bash -l
conda create -dn null --json "$#" |\
grep '"url"' | grep -oE 'https[^"]+' |\
xargs wget -c
which can be used as
./conda-download.sh -c conda-forge -c pytorch numpy pandas pytorch transformers
that is, it accepts all arguments conda create would, and will download all the tarballs locally.
Ignoring Cached Packages
If you already have some packages cached then the above will not redownload them. Instead, if you wish to download all tarballs needed for an environment, then you could use this alternate version which overrides the package cache using an empty temporary directory:
File: conda-download-all.sh
#!/bin/bash -l
tmp_dir=$(mktemp -d)
CONDA_PKGS_DIRS=$tmp_dir conda create -dn null --json "$#" |\
grep '"url"' | grep -oE 'https[^"]+' |\
xargs wget -c
rm -r $tmp_dir
Do you really want to use conda-pack? That lets you archive a conda-environment for reproducing without using the internet or re-solving for dependencies. To just prevent re-solving you can also use conda env export --explict but that still ties you to the source (internet or local disk repository).
If you have a static environment (read-only) and want to really reduce docker size, you can volume mount the environment at runtime. You would need to match the file paths (ie: /opt/anaconda => /opt/anaconda).

How upload large number of files to a google storage bucket?

Which is the right way to upload a specific folder from a PC with thousands of files and subfolders to a Google Storage Bucket?
I tried with gsutil command:
gsutil -m cp -r myfolder gs://my-bucket
But transfer stops and only upload a few of hundred files until drops a Python error.
Is this the right way for do this?
Microsoft Azure Storage has a (wonderful) graphical tool called Microsoft Azure Storage Explorer and with command azcopy works perfectly, upload all thousand of files so quickly.
You can use gsutil with the -R recursive flag to upload all the files and sub-directories.
gsutil -m cp -R dir gs://my_bucket

What "option" to use with "WGET" for selecting only few files with particular extension from a FTP directory

I am trying to download files with particular datestamp as an extension from a folder through FTP server. Since the folder contains all other files, I wanted to download only files with a particular datestamp.
I tried using wget files_datestamp*.extension, which didn't work.
I also tried using wget -i files_datestamp*.extension, which downloads all.
My question is: What option to use with wget to download only particular files that I am interested in?
wget http://collaboration.cmc.ec.gc.ca/cmc/CMOI/NetCDF/NMME/1p0deg/#%23%23/CanCM3_201904_r4i1p1_20190501*.nc4
The link you've shared is over HTTP and not FTP. As a result, it is not possible to glob over the filenames, that is feasible only over FTP.
With HTTP, it is imperative that you have access to a directory listing page which tells you which files are available. Then use -r --accept-regex=<regex here> to download your files

Need to download file on website from command line

I have a link on my website that when clicked dynamically creates a csv file and downloads the file. I need a way to do this in a batch file so that the file can be downloaded automatically (via task scheduler). I have played around with wget but I can't get the file. Thank you in advance for your help!
bitsadmin.exe /transfer "Job Name" downloadUrl destination
If you are using Windows 7 then use same command in Power Shell
Note:
downloadUrl : It is the download url from referred website
destination : It is path of the file where we need to download it.
I use it as follows:
#plain wget
wget "http://blah.com:8080/etc/myjar.jar"
#wget but skirting proxy settings
wget --no-proxy "http://blah.com:8080/etc/myjar.jar"
Or to download to a specific filename (perhaps to enable consistent naming in scripts):
wget -O myjar.jar --no-proxy "http://blah.com:8080/etc/myjar1.jar"
If you're having issues, ensure wget logging is on and possibly debug (which will be augmented with your logging):
# additional logging
wget -o myjar1.jar.log "http://blah.com:8080/etcetcetc/myjar1.jar"
#debug (if wget was compiled with debug symbols only!)
wget -o myjar1.jar.log -d "http://blah.com:8080/etc/myjar1.jar"
Additional checks you may need to do if still no success:
Can you ping the target host?
Can you "see" the target file in a browser?
Is the target file actually on the server?

wget all files from folder skip first 10000 files or so

I am transferring files from a folder on one server to another and I am using wget to do so.
But the problem is that wget gets terminated and when I rerun the command it starts from the very first file although I use -nc to skip files that exist but still it traverses all the files and then skip those files that exist so it takes too much time in skipping the files.
I want to know is there any way to have wget start downloading directly from the new file instead of checking each file from the top.
I hope I have made my question clear. Pardon me if couldn't.
This is the command that I am using:
wget -H -r --level=1 -k -p -nc http://www.example.com/images/
You could try using a reject-list to skip already downloaded files.
If all your files are in the same directory, it could be as simple as:
wget -R "`ls -1 | tr "\n" ,`" <your own options>
I am not sure what will happen with partial downloads.

Resources