wget and strange file extensions - file

I am having a bit of trouble grabbing some files that have a strange file structure. What do I mean exactly? http://downloads.cloudmade.com/americas/northern_america/united_states/district_of_columbia#downloads_breadcrumbs
Look at that example. I want to start at the root of the site and recursively grab all the files that end with *.shapefile.zip. wget appears to treat this as two separate files ending in .shapefile and .zip. Anyone have some wget goodness to help me get started on this one?

You can recursively wget specific file types with:
wget -A 'shapefiles.zip' -r <url>
Although I don't think .shapefiles.zip is an extension of .zip but more that site's naming conventions

Related

Using wget but only getting html files

I am trying to download multiple NETCDF files from NASA website.
So I was following their tutorial of how to download multiple files using wget for windows(https://disc.gsfc.nasa.gov/data-access#windows_wget).
When I try to use the option to dowload multiple data files at once, the output is only returning to me html files and not the netcdf files. Does anyone know what can be happening?
Ps.: I am executing with the following command:
wget --load-cookies C:\.urs_cookies --save-cookies C:\.urs_cookies --auth-no-challenge=on --keep-session-cookies --user=<your username> --ask-password --content-disposition -i <url.txt>

What "option" to use with "WGET" for selecting only few files with particular extension from a FTP directory

I am trying to download files with particular datestamp as an extension from a folder through FTP server. Since the folder contains all other files, I wanted to download only files with a particular datestamp.
I tried using wget files_datestamp*.extension, which didn't work.
I also tried using wget -i files_datestamp*.extension, which downloads all.
My question is: What option to use with wget to download only particular files that I am interested in?
wget http://collaboration.cmc.ec.gc.ca/cmc/CMOI/NetCDF/NMME/1p0deg/#%23%23/CanCM3_201904_r4i1p1_20190501*.nc4
The link you've shared is over HTTP and not FTP. As a result, it is not possible to glob over the filenames, that is feasible only over FTP.
With HTTP, it is imperative that you have access to a directory listing page which tells you which files are available. Then use -r --accept-regex=<regex here> to download your files

Downloading artifacts from Jenkins using wget or curl

I am trying to download an artifact from a Jenkins project using a DOS batch script. The reason that this is more than trivial is that my artifact is a ZIP file which includes the Jenkins build number in its name, hence I don't know the exact file name.
My current plan of attack is to use wget pointing at: /lastSuccessfulBuild/artifact/
to do some sort of recursive/mirror download.
If I do the following:
wget -r -np -l 1 -A zip --auth-no-challenge --http-user=**** --http-password=**** http://*.*.*.*:8080/job/MyProject/lastSuccessfulBuild/artifact/
(*s are chars I've changed for posting to SO)
I never get a ZIP file. If I omit the -A zip option, I do get the index.html, so I think the authorisation is working, unless it's some sort of session caching issue?
With -A zip I get as part of the response:
Removing ...+8080/job/MyProject/lastSuccessfulBuild/artifact/index.html since it should be rejected.
So I'm not sure if maybe it's removing that file and so not following its links? But doing -A zip,html doesn't work either.
I've tried several wget options, and also curl, but I am getting nowhere.
I don't know if I have the wrong wget options or whether there is something special about Jenkins authentication.
You can add /*zip*/desired_archive_name.zip to any folder of the artifacts location.
If your ZIP file is the only artifact that the job archives, you can use:
http://*.*.*.*:8080/job/MyProject/lastSuccessfulBuild/artifact/*zip*/myfile.zip
where myfile.zip is just a name you assign to the downloadable archive, could be anything.
If you have multiple artifacts archived, you can either still get the ZIP file of all of them, and deal with individual ones on extraction. Or place the artifact that you want into a separate folder, and apply the /*zip*/ to that folder.

Find batch command to copy relative path not working

First of all, I don't know Batch programming at all. I came across a FIND command in a tutorial I was reading about OpenCV
http://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html
find ./positive_images -iname "*.jpg" > positives.txt
It basically is supposed to copy all the relative paths of all the jpeg files inside positive_images directory to positives.txt file. I ran this in CMD(as Administrator) and got the following:
What is the meaning of Access Denied? I don't want to learn Batch Programming for this as I am already busy in my project. Please give me a simple-to-understand solution.
The referred tutorial uses the bash find command.But you're executing the Windows find command.Download from somewhere the windows port for the unix command put it in your directory and call it like
.\find.exe .\positive_images -iname "*.jpg" > positives.txt
mind also the windows path separator slashes.
you can use this port for example -> http://unxutils.sourceforge.net/
(probably there's a newer port but this should do the work)

wget all files from folder skip first 10000 files or so

I am transferring files from a folder on one server to another and I am using wget to do so.
But the problem is that wget gets terminated and when I rerun the command it starts from the very first file although I use -nc to skip files that exist but still it traverses all the files and then skip those files that exist so it takes too much time in skipping the files.
I want to know is there any way to have wget start downloading directly from the new file instead of checking each file from the top.
I hope I have made my question clear. Pardon me if couldn't.
This is the command that I am using:
wget -H -r --level=1 -k -p -nc http://www.example.com/images/
You could try using a reject-list to skip already downloaded files.
If all your files are in the same directory, it could be as simple as:
wget -R "`ls -1 | tr "\n" ,`" <your own options>
I am not sure what will happen with partial downloads.

Resources