I want to download some files from an online database, but it does not allow me to download all the files at once. Instead it offers to download a file for a searched keyword. Because I have more than 20000 keywords, it's not feasible for me.
For example, I want to download whole information about miRNA-mRNA interaction from SarBase, but it does not offer an option to download all of them at once.
I wonder, how can I download it by writing some scripts. Can anybody help me?
Make a file called getdb.sh.
#!/bin/bash
echo "Download keywords in kw.txt."
for kw in $(cat kw.txt)
do
curl http://www.mirbase.org/cgi-bin/get_seq.pl?acc=$kw > $kw.txt
done
Create another file called kw.txt:
MI0000342
MI0000343
MI0000344
Then run this
$ chmod +x getdb.sh
$ ./getdb.sh
Download keywords in kw.txt.
$ ls -1 *.txt
kw.txt
MI0000342.txt
MI0000343.txt
MI0000344.txt
another way
cat kw.txt |xargs -i curl -o {}.txt http://www.mirbase.org/cgi-bin/get_seq.pl?acc={}
Related
When using yt-audio, how can you remove the thumbnail image (artwork / screenshot) that comes with the downloaded mp3 file?
Best would be to have a way of doing it by adding an additional argument to the command, but looping through the downloaded files works too if someone knows how to do that.
Just in case, this is the description of the yt-audio usage:
usage: yt-audio [OPTIONS] REQUIRED_ARGS
A simple, configurable youtube-dl wrapper for downloading and managing youtube audio.
Required Arguments (Any/all):
URL[::DIR] Video/Playlist URL with (optional) save directory [URL::dir]
-e, --example1 Example playlist [Custom]
--all All [Custom] Arguments
Optional Arguments:
-h, --help show this help message and exit
-v, --version show version and exit
--use-archive use archive file to track downloaded titles
--use-metadata use metadata to track downloaded titles
--output-format [OUTPUT_FORMAT]
File output format
--ytdl-args [YTDL_ADDITIONAL_ARGS]
youtube-dl additional arguments
Thank you all!!
So in the end, I found the answer to this one myself (quite ashamed of the time it took me though.)
To remove the thumbnail, don't download it.
That sums it up basically.
To not download the thumbnail, I simply needed to edit the common.py file once yt-audio was installed.
The file is in the installation, under: yt_audio/common.py
Editing the common.py file.
In common.py find the Common class, and edit DEFAULT_ARGUMENT_VALUES.
Or simply replace the value assigned to it, with this thumbnail-less version (below).
DEFAULT_ARGUMENT_VALUES = {
'download_command': 'youtube-dl -x -q --print-json --audio-format mp3 --audio-quality 0 '
'--add-metadata -o "$OUTPUT$" $URL$',
'playlist_info_command': 'youtube-dl --flat-playlist -J $PLAYLIST_URL$',
'output_format': '%%(title)s.%%(ext)s',
'ffprobe_command': 'ffprobe -v quiet -print_format json -show_format -hide_banner "$PATH$"',
'output_directory': str(PurePath(Path.home(), "Music"))
}
That's it.
I am using SOLR 5 and I want to scan documents that have no extensions. Unfortunately changing the file to have extensions is not an option in my case.
the command I am using is simply:
$bin/post -c mycore ../foldertobescaned -type application/pdf
the command works fine for documents that do have extension but I am getting:
Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
If renaming the files is not an option, you can use the following script as a workaround until Solr improves its post method. It is a simple bash for loop that submits each file individually and works regardless of the file extension. Note that this script will be slower than using post on the whole folder, because each individual file transfer needs to be initialized.
Save the script below as postFolderToSolr.sh inside your Solr folder (so that Solrs bin/ folder is a subdirectory), make it executable with chmod +x postFolderToSolr.sh and then use it as follows: ./postFolderToSolr.sh mycore /home/user1/foldertobescaned/ application/pdf
Using no arguments or the wrong number of arguments prints a short usage message as help.
#!/bin/bash
set -o nounset
if [ "$#" -ne 3 ]
then
echo "Post contents of a folder to Solr."
echo
echo "Usage: postFolderToSolr.sh <colletionName> </path/to/folder> <MIME>"
echo
exit 1
fi
collection=$1
inputPath=${2%/} # remove suffix / if it exists
mime=$3
for element in $inputPath"/"*; do
bin/post -c $collection -type $mime $element
done
Is it possible to mass rename objects on Google Cloud Storage using gsutil (or some other tool)? I am trying to figure out a way to rename a bunch of images from *.JPG to *.jpg.
Here is a native way to do this in bash with an explanation below, line by line of the code:
gsutil ls gs://bucket_name/*.JPG > src-rename-list.txt
sed 's/\.JPG/\.jpg/g' src-rename-list.txt > dest-rename-list.txt
paste -d ' ' src-rename-list.txt dest-rename-list.txt | sed -e 's/^/gsutil\ mv\ /' | while read line; do bash -c "$line"; done
rm src-rename-list.txt; rm dest-rename-list.txt
The solution pushes 2 lists, one for the source and one for the destination file (to be used in the "gsutil mv" command):
gsutil ls gs://bucket_name/*.JPG > src-rename-list.txt
sed 's/\.JPG/\.jpg/g' src-rename-list.txt > dest-rename-list.txt
The line "gsutil mv " and the two files are concatenated line by line using the below code:
paste -d ' ' src-rename-list.txt dest-rename-list.txt | sed -e 's/^/gsutil\ mv\ /'
This then runs each line in a while loop:
while read line; do bash -c "$line"; done
Lastly, clean up and delete the files created:
rm src-rename-list.txt; rm dest-rename-list.txt
The above has been tested against a working Google Storage bucket.
https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames
gsutil supports URI wildcards
EDIT
gsutil 3.0 release note
As part of the bucket sub-directory support we changed the * wildcard to match only up to directory boundaries, and introduced the new ** wildcard...
Do you have directories under bucket? if so, maybe you need to go down to each directories or use **.
gsutil -m mv gs://my_bucket/**.JPG gs://my_bucket/**.jpg
or
gsutil -m mv gs://my_bucket/mydir/*.JPG gs://my_bucket/mydir/*.jpg
EDIT
gsutil doesn't support wildcard for destination so far (as of 4/12/'14)
nether API.
so at this moment you need to retrieve list of all JPG files,
and rename each files.
python example:
import subprocess
files = subprocess.check_output("gsutil ls gs://my_bucket/*.JPG",shell=True)
files = files.split("\n")[:-1]
for f in files:
subprocess.call("gsutil mv %s %s"%(f,f[:-3]+"jpg"),shell=True)
please note that this would take hours.
gsutil does not support parallelized and mass-copy/rename.
You have two options:
use a dataflow process to do the operation
or
use GNU parallel to launch it using several processes
If you use GNU Parallel, it is better to deploy a new instance to do the mass copy/rename operation:
First: - Make a list of files you want to copy/rename (a file with source and destination separated by a space or tab), like this:
gs://origin_bucket/path/file gs://dest_bucket/new_path/new_filename
Second: Launch a new compute instance
Third: Login in that instance and install Gnu parallel
sudo apt install parallel
Third: authorize yourself with google (gcloud auth login) because the service account for compute might not have permissions to move/rename the files.
gcloud auth login
Make the copy (gsutil cp) or move (gsutil mv) operation with parallel:
parallel -j 20 --colsep ' ' gsutil mv {1} {2} :::: file_with_source_destination_uris.txt
This will make 20 parallel runs of the gsutil cp operation.
Yes, it is possible:
Move/rename objects and/or subdirectories
I'm trying to specify an FBX file in MEL using the command
file -f -pmt 0 -options "v=0;" -typ "FBX" -o
on one computer this works great. On another, it fails but DOES work if I use
-typ "Fbx"
I think I'd like to query for the supported translators in my script, then either select the correct one or report an error. Is this possible? Am I mis-diagnosing the problem?
MEL has a command called pluginInfo. You could write a simple function that will return the proper spelling based on that. pluginInfo -v -query "fbxmaya"; will provide the version of the fbx plugin. I haven't used MEL in a while so I'm not gonna try to make this perfect but maybe something like if(pluginInfo -v -query "fbxmaya") ) string fbxType = "FBX" else( string fbxType = "Fbx"). Then just plug that var into file -f -pmt 0 -options "v=0;" -typ $fbxType -o.
It might be a different version of fbx. You'd have to provide another line which determines the version of fbx on that particular machine and pipes in the correct spelling.
I'm trying to download multiple files and need to rename as I download, how can I do that and specify the directory I want them to download to? I know i need to be using -P and -O to do this but it does not seem to be working for me.
Ok it's too late to post my answer here but I'll correct #Bill's answer
If you read in "man wget" you will see the following
...
wget [option]... [URL]...
...
That is, common sense leads to realizing that
wget -O /directory_path/filename.file_format https://example.com
is the default that aligns with the wget documentation.
Remember: Just because it works doesn't mean it's right!
I ran into a similar situation and came across your question. I was able to get what I needed by writting a little bash script that parsed a file of urls in one column and the name in the 2nd.
This is the script I used for my particular requirement. Maybe it will give you some guidance if you still need help.
#!/bin/bash
FILE=URLhtmlPageWImagesWids.txt
while read line
do
F1=$(echo $line|cut -d " " -f1)
F2=$(echo $line|cut -d " " -f2)
wget -r -l1 --no-parent -A.jpg -O $F2.jpg $F1
done < $FILE
This won't work actually because -O combines all results into one page.
You could try using the --no-directories or --cut-dirs switch and in the loop process the files in the folder how you want to rename them.
wget your_url -O your_specify_dir/your_name
Like Bill was saying
wget http://example.com/original-filename -O /home/new_filename
worked for me !
Thanks
This may works for everyone
mkdir Download1
wget -O "Download1/test 10mb.zip" "http://www.speedtest.com.sg/test_random_10mb.zip"
You need to use " " for name with space.
I'm a little late to the party, but I just wrote a script to do this. You can check it out here: bulkGetter