Setting default cache-control header for gsutil rsync - google-app-engine

Since there is unfortunately no way to set a default cache-control header for a bucket (which there really should be...), is there a way to specify a default cache-control header for the gsutil rsync command? Or barring that, an easy way to programmatically get a list of all the files actually changed during the rsync, that we can pass into the setmeta command?
Since we have quite a few files which change for each deploy, we have to do a setmeta on ** after each deploy, which takes an unreasonable amount of time... but haven't figured out any better way to reliably ensure correct cache-control headers for all files in the cloud storage bucket, since there is no way to set up proper defaults for either the bucket itself or the rsync command. Is there a better way to accomplish this goal? What are we missing?

If you want the same Cache-Control header for all the files you upload, you could use the gsutil -h option to cause the gsutil rsync command to set the Cache-Control headers. Example:
gsutil -m -h "Cache-Control:private, max-age=0, no-transform" rsync -r ./dir gs://my-bucket

Depending on your needs you can also no-cache
gsutil -m -h "Cache-Control:no-cache" rsync -r ./dir gs://my-bucket
A list of directives can be found on the Mozilla docs site here.

Related

What "option" to use with "WGET" for selecting only few files with particular extension from a FTP directory

I am trying to download files with particular datestamp as an extension from a folder through FTP server. Since the folder contains all other files, I wanted to download only files with a particular datestamp.
I tried using wget files_datestamp*.extension, which didn't work.
I also tried using wget -i files_datestamp*.extension, which downloads all.
My question is: What option to use with wget to download only particular files that I am interested in?
wget http://collaboration.cmc.ec.gc.ca/cmc/CMOI/NetCDF/NMME/1p0deg/#%23%23/CanCM3_201904_r4i1p1_20190501*.nc4
The link you've shared is over HTTP and not FTP. As a result, it is not possible to glob over the filenames, that is feasible only over FTP.
With HTTP, it is imperative that you have access to a directory listing page which tells you which files are available. Then use -r --accept-regex=<regex here> to download your files

Downloading artifacts from Jenkins using wget or curl

I am trying to download an artifact from a Jenkins project using a DOS batch script. The reason that this is more than trivial is that my artifact is a ZIP file which includes the Jenkins build number in its name, hence I don't know the exact file name.
My current plan of attack is to use wget pointing at: /lastSuccessfulBuild/artifact/
to do some sort of recursive/mirror download.
If I do the following:
wget -r -np -l 1 -A zip --auth-no-challenge --http-user=**** --http-password=**** http://*.*.*.*:8080/job/MyProject/lastSuccessfulBuild/artifact/
(*s are chars I've changed for posting to SO)
I never get a ZIP file. If I omit the -A zip option, I do get the index.html, so I think the authorisation is working, unless it's some sort of session caching issue?
With -A zip I get as part of the response:
Removing ...+8080/job/MyProject/lastSuccessfulBuild/artifact/index.html since it should be rejected.
So I'm not sure if maybe it's removing that file and so not following its links? But doing -A zip,html doesn't work either.
I've tried several wget options, and also curl, but I am getting nowhere.
I don't know if I have the wrong wget options or whether there is something special about Jenkins authentication.
You can add /*zip*/desired_archive_name.zip to any folder of the artifacts location.
If your ZIP file is the only artifact that the job archives, you can use:
http://*.*.*.*:8080/job/MyProject/lastSuccessfulBuild/artifact/*zip*/myfile.zip
where myfile.zip is just a name you assign to the downloadable archive, could be anything.
If you have multiple artifacts archived, you can either still get the ZIP file of all of them, and deal with individual ones on extraction. Or place the artifact that you want into a separate folder, and apply the /*zip*/ to that folder.

Wget Downloading Issue?

I have a problem with wget or my code..
wget -N --no-check-certificate "https://www.dropbox.com/s/qjf0ka54yuwz81d/Test.zip?dl=0"
The -N syntax is supposed to be downloading the file 'only' if the file is newer in terms of modification date but, It downloads the file every time I run the script. I see Last-modified header missing -- time-stamps turned off at the end of the output.
I don't get it as, When I upload/update a file on dropbox it does change and show "Modified" date-time but, wget doesn't get it or what?
Any help would be highly appreciated. Cheers.

Need to download file on website from command line

I have a link on my website that when clicked dynamically creates a csv file and downloads the file. I need a way to do this in a batch file so that the file can be downloaded automatically (via task scheduler). I have played around with wget but I can't get the file. Thank you in advance for your help!
bitsadmin.exe /transfer "Job Name" downloadUrl destination
If you are using Windows 7 then use same command in Power Shell
Note:
downloadUrl : It is the download url from referred website
destination : It is path of the file where we need to download it.
I use it as follows:
#plain wget
wget "http://blah.com:8080/etc/myjar.jar"
#wget but skirting proxy settings
wget --no-proxy "http://blah.com:8080/etc/myjar.jar"
Or to download to a specific filename (perhaps to enable consistent naming in scripts):
wget -O myjar.jar --no-proxy "http://blah.com:8080/etc/myjar1.jar"
If you're having issues, ensure wget logging is on and possibly debug (which will be augmented with your logging):
# additional logging
wget -o myjar1.jar.log "http://blah.com:8080/etcetcetc/myjar1.jar"
#debug (if wget was compiled with debug symbols only!)
wget -o myjar1.jar.log -d "http://blah.com:8080/etc/myjar1.jar"
Additional checks you may need to do if still no success:
Can you ping the target host?
Can you "see" the target file in a browser?
Is the target file actually on the server?

How do I recursively ftp only certain file types from a linux server using the command line?

I want to download only .htm or .html files from my server. I'm trying to use ncftpget and even wget but only with limited success.
with ncftpget I can download the whole tree structure no problem but can't seem to specify which files I want, it's either all or nothing.
If I specify the file type like this, it only looks in the top folder:
ncftpget -R -u myuser -p mypass ftp://ftp.myserver.com/public_html/*.htm ./local_folder
If I do this, it downloads the whole site and not just .htm files:
ncftpget -R -u myuser -p mypass ftp://ftp.myserver.com/public_html/ ./local_folder *.htm
Can I use ncftp to do this, or is there another tool I should be using?
You can do it with wget
wget -r -np -A "*.htm*" ftp://site/dir
or:
wget -m -np -A "*.htm*" ftp://user:pass#host/dir
However, as per Types of Files:
Note that these two options do not affect the downloading of HTML files (as determined by a .htm or .html filename prefix). This behavior may not be desirable for all users, and may be changed for future versions of Wget.
Does ncftpget understand dir globs?
Try
ncftpget -R -u myuser -p mypass ftp://ftp.myserver.com/public_html/**/*.htm ./local_folder
** means any number of directories.
The wget command understands standing unix file globbing syntax.
wget -r -np --ftp-user=username --ftp-password=password "ftp://example.com/path/to/dir/*.htm"
Conversely, you can use the -A option, which accepts a comma-separated list of file name suffixes or patterns to accept.
wget -A '*.htm'
The -R option is the opposite of -A, so you can use it to specify patterns NOT to fetch.
Caveat: Make sure to quote patterns! Otherwise, your shell may expand the glob itself, leading to unexpected results.
Also! See the "Using wget to recursively download whole FTP directories" question on Server Fault.

Resources