Mass rename objects on Google Cloud Storage - google-app-engine

Is it possible to mass rename objects on Google Cloud Storage using gsutil (or some other tool)? I am trying to figure out a way to rename a bunch of images from *.JPG to *.jpg.

Here is a native way to do this in bash with an explanation below, line by line of the code:
gsutil ls gs://bucket_name/*.JPG > src-rename-list.txt
sed 's/\.JPG/\.jpg/g' src-rename-list.txt > dest-rename-list.txt
paste -d ' ' src-rename-list.txt dest-rename-list.txt | sed -e 's/^/gsutil\ mv\ /' | while read line; do bash -c "$line"; done
rm src-rename-list.txt; rm dest-rename-list.txt
The solution pushes 2 lists, one for the source and one for the destination file (to be used in the "gsutil mv" command):
gsutil ls gs://bucket_name/*.JPG > src-rename-list.txt
sed 's/\.JPG/\.jpg/g' src-rename-list.txt > dest-rename-list.txt
The line "gsutil mv " and the two files are concatenated line by line using the below code:
paste -d ' ' src-rename-list.txt dest-rename-list.txt | sed -e 's/^/gsutil\ mv\ /'
This then runs each line in a while loop:
while read line; do bash -c "$line"; done
Lastly, clean up and delete the files created:
rm src-rename-list.txt; rm dest-rename-list.txt
The above has been tested against a working Google Storage bucket.

https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames
gsutil supports URI wildcards
EDIT
gsutil 3.0 release note
As part of the bucket sub-directory support we changed the * wildcard to match only up to directory boundaries, and introduced the new ** wildcard...
Do you have directories under bucket? if so, maybe you need to go down to each directories or use **.
gsutil -m mv gs://my_bucket/**.JPG gs://my_bucket/**.jpg
or
gsutil -m mv gs://my_bucket/mydir/*.JPG gs://my_bucket/mydir/*.jpg
EDIT
gsutil doesn't support wildcard for destination so far (as of 4/12/'14)
nether API.
so at this moment you need to retrieve list of all JPG files,
and rename each files.
python example:
import subprocess
files = subprocess.check_output("gsutil ls gs://my_bucket/*.JPG",shell=True)
files = files.split("\n")[:-1]
for f in files:
subprocess.call("gsutil mv %s %s"%(f,f[:-3]+"jpg"),shell=True)
please note that this would take hours.

gsutil does not support parallelized and mass-copy/rename.
You have two options:
use a dataflow process to do the operation
or
use GNU parallel to launch it using several processes
If you use GNU Parallel, it is better to deploy a new instance to do the mass copy/rename operation:
First: - Make a list of files you want to copy/rename (a file with source and destination separated by a space or tab), like this:
gs://origin_bucket/path/file gs://dest_bucket/new_path/new_filename
Second: Launch a new compute instance
Third: Login in that instance and install Gnu parallel
sudo apt install parallel
Third: authorize yourself with google (gcloud auth login) because the service account for compute might not have permissions to move/rename the files.
gcloud auth login
Make the copy (gsutil cp) or move (gsutil mv) operation with parallel:
parallel -j 20 --colsep ' ' gsutil mv {1} {2} :::: file_with_source_destination_uris.txt
This will make 20 parallel runs of the gsutil cp operation.

Yes, it is possible:
Move/rename objects and/or subdirectories

Related

How can I insert Files into a zip file using Lua

I am writing a script in Lua 5.1 for use with a game engine (EDGE).
I need my script to copy about 20 files into a .miz file (which is really a zipped folder with a set structure) and navigate that structure and copy those files in from a non-zipped folder on the hard drive.
Because Windows 11 it the future I need to use NanaZip rather than 7z which isn't W11 supported.
However, all the examples I've found are for using LUA to zip up files, not insert non-zipped files INTO a zip file without unzipping it.
Is this even possible?
Similar to #koyaanisqatsi I tried it with 7z. You didn't comment on our question on why 7z should be avoided nor whether you are even allowed to use os.execute, but it should provide a good starting point:
os.execute("7z a yourZip.zip yourFile.png")
Where a is the flag for Add.
See the manual for other flags like compression: https://linux.die.net/man/1/7z
Windows 11 also have tar that have the option r and u
D:\temp>tar h
tar(bsdtar): manipulate archive files
First option must be a mode specifier:
-c Create -r Add/Replace -t List -u Update -x Extract
Common Options:
-b # Use # 512-byte records per I/O block
-f <filename> Location of archive (default \\.\tape0)
-v Verbose
-w Interactive
Create: tar -c [options] [<file> | <dir> | #<archive> | -C <dir> ]
<file>, <dir> add these items to archive
-z, -j, -J, --lzma Compress archive with gzip/bzip2/xz/lzma
--format {ustar|pax|cpio|shar} Select archive format
--exclude <pattern> Skip files that match pattern
-C <dir> Change to <dir> before processing remaining files
#<archive> Add entries from <archive> to output
List: tar -t [options] [<patterns>]
<patterns> If specified, list only entries that match
Extract: tar -x [options] [<patterns>]
<patterns> If specified, extract only entries that match
-k Keep (don't overwrite) existing files
-m Don't restore modification times
-O Write entries to stdout, don't restore to disk
-p Restore permissions (including ACLs, owner, file flags)
bsdtar 3.5.2 - libarchive 3.5.2 zlib/1.2.5.f-ipp bz2lib/1.0.6
( Above cmd.exe was opened from Lua with: os.execute('cmd') )
You can extract a ZIP with it but not creating one as far as i know.
(tar -xf archive.zip)
But is it a Problem for you to use TAR instead of ZIP?

Concatenate two GSUtil Command

I want to execute two gsutil command in a single line, how can i achieve that.
For ex:
gsutil ls gs://projectname/bucketname/folder1/folder2/filename.png | \
cp gs://projectname/bucketname/folder3/folder4/
Find/ List a file from specific bucket and copy the same file to the bucket folder. In the above command i'm using ls (list) and cp (copy) command, but this is not working as expected.
Something similar to the below shell script or linux command, we use exec and continue the next command right.
find -type f -path '*schedule*/*' -name "*.png" -exec cp -n {} /tmp/MusicFiles \;
Your early response is highly appreciated. Thanks in advance..!
Oneliner: From gsutil help cp:
-I Causes gsutil to read the list of files or objects to copy from
stdin. This allows you to run a program that generates the list
of files to upload/download.
So, some_program | gsutil -m cp -I gs://my-bucket.
Where some program can be gsutil ls ...
Additionally,
gsutil accepts wildcards, so that command would be something like this.
gsutul cp gs:/bucket-name/*schedule*/*/*.png gs://bucketname/folder3/folder4/
Notice that there it is possible to do single star * and double star ** for recursive wildcards.

Strange behaviour of sejda-console in batch for loop

I try to write a wrapper script for songbook generation using lilypond, latex and sejda-console (for the pdf part). Everything works so far, but I have a problem with sejda that is giving me nuts.
Here is the relevant part of my code:
for %%i in (%f%) do (
sejda-console.bat extractbybookmarks -f ".\%%~ni.pdf" -o "export\%%~ni\%title%.pdf" -l 2 -p [BOOKMARK_NAME] -e "%title%" --overwrite
)
where f is a ";"-separated list of files. This command works for the first file, but fails for all others. I can't find any difference between the commands that sejda receives. Here is my console output:
make_sheet.bat -t "Live it up" --supress *.lytex
Configuring Sejda 3.2.30
Starting execution with arguments: 'extractbybookmarks -f .\book_drums.pdf -o export\book_drums\Live it up.pdf -l 2 -p [BOOKMARK_NAME] -e Live it up --overwrite'
Java version: '1.8.0_151'
Validating parameters.
Starting task (org.sejda.impl.sambox.ExtractByOutlineTask#28701274) execution.
Opening C:\Users\skr1_\Desktop\Tools\Songbook\Sample\out\.\book_drums.pdf
Retrieving outline information for level 2 and match regex Live it up
Starting extraction by outline, level 2 and match regex Live it up
Found 0 inherited images and 0 inherited fonts potentially unused
Starting extracting Live it up pages 9 9
Created output temporary buffer C:\Users\skr1_\Desktop\Tools\Songbook\Sample\out\export\book_drums\.sejdaTmp2789047920522272436.tmp
Appended relevant outline items
Filtering annotations
Skipped acroform merge, nothing to merge
Ending extracting Live it up
Task progress: 0% done
Moving C:\Users\skr1_\Desktop\Tools\Songbook\Sample\out\export\book_drums\.sejdaTmp2789047920522272436.tmp to C:\Users\skr1_\Desktop\Tools\Songbook\Sample\out\export\book_drums\Live it up.pdf.
Extraction completed and outputs written to org.sejda.model.output.FileOrDirectoryTaskOutput#478190fc[C:\Users\skr1_\Desktop\Tools\Songbook\Sample\out\export\book_drums\Live it up.pdf]
Task (org.sejda.impl.sambox.ExtractByOutlineTask#28701274) executed in 0 seconds
Completed execution
C:\Users\skr1_\Desktop\Tools\Songbook\Sample>(sejda-console.bat extractbybookmarks -f ".\book_general.pdf" -o "export\book_general\Live it up.pdf" -l 2 -p [BOOKMARK_NAME] -e "Live it up" --overwrite )
Configuring Sejda 3.2.30
Starting execution with arguments: 'extractbybookmarks -f .\book_general.pdf -o export\book_general\Live it up.pdf -l 2 -p [BOOKMARK_NAME] -e Live it up --overwrite'
Java version: '1.8.0_151'
Invalid value (File '.\book_general.pdf' does not exist): --files -f value... : pdf files to operate on. A list of existing pdf files (EX. -f /tmp/file1.pdf or -f /tmp/password_protected_file2.pdf:secret123) (required)
Invalid value (File '.\book_general.pdf' does not exist): --files -f value... : pdf files to operate on. A list of existing pdf files (EX. -f /tmp/file1.pdf or -f /tmp/password_protected_file2.pdf:secret123) (required)
C:\Users\skr1_\Desktop\Tools\Songbook\Sample>(sejda-console.bat extractbybookmarks -f ".\book_guitar.pdf" -o "export\book_guitar\Live it up.pdf" -l 2 -p [BOOKMARK_NAME] -e "Live it up" --overwrite )
Configuring Sejda 3.2.30
Starting execution with arguments: 'extractbybookmarks -f .\book_guitar.pdf -o export\book_guitar\Live it up.pdf -l 2 -p [BOOKMARK_NAME] -e Live it up --overwrite'
Java version: '1.8.0_151'
Invalid value (File '.\book_guitar.pdf' does not exist): --files -f value... : pdf files to operate on. A list of existing pdf files (EX. -f /tmp/file1.pdf or -f /tmp/password_protected_file2.pdf:secret123) (required)
Invalid value (File '.\book_guitar.pdf' does not exist): --files -f value... : pdf files to operate on. A list of existing pdf files (EX. -f /tmp/file1.pdf or -f /tmp/password_protected_file2.pdf:secret123) (required)
Even worse, if I copy the commands that sejda receives and paste them as arguments for a new command, everything works fine.
I suspect that something is happening with the working directory in between, but I don't get it.
Also, note that the output includes the command for subsequent passes of the for-loop (starting with "(sejda-console.bat ...") though echo is off. It is not included for the first run, however.
I'm not an expert with programming, especially not with batch, and any help would be very appreciated.
I'd suggest that sejda.bat is changing the current directory.
Try
pushd
call sejda.bat ...
popd

Downloading file from online database using bash script

I want to download some files from an online database, but it does not allow me to download all the files at once. Instead it offers to download a file for a searched keyword. Because I have more than 20000 keywords, it's not feasible for me.
For example, I want to download whole information about miRNA-mRNA interaction from SarBase, but it does not offer an option to download all of them at once.
I wonder, how can I download it by writing some scripts. Can anybody help me?
Make a file called getdb.sh.
#!/bin/bash
echo "Download keywords in kw.txt."
for kw in $(cat kw.txt)
do
curl http://www.mirbase.org/cgi-bin/get_seq.pl?acc=$kw > $kw.txt
done
Create another file called kw.txt:
MI0000342
MI0000343
MI0000344
Then run this
$ chmod +x getdb.sh
$ ./getdb.sh
Download keywords in kw.txt.
$ ls -1 *.txt
kw.txt
MI0000342.txt
MI0000343.txt
MI0000344.txt
another way
cat kw.txt |xargs -i curl -o {}.txt http://www.mirbase.org/cgi-bin/get_seq.pl?acc={}

Getting specific files from server

Using Terminal and Shell/Bash commands is there a way to retrive specific files from a web directory? I.e.
Directory: www.site.com/samples/
copy all files ending in ".h" into a folder
The folder contains text files, and other files associated that are of no use.
Thanks :)
There are multiple ways of achieving this recursively:
1. using find
1.1 making directorys using find -p to create recursive folders without errors
cd path;
mkdir backup
find www.site.com/samples/ -type d -exec mkdir -p {} backup/{} \;
1.2 finding specific files and copying to backup folder -p to perserve permissions
find www.site.com/samples/ -name \*.h -exec cp -p {} backup/{} \;
Using tar well actually for reverse type of work i.e. to exclude specific files which the part of the question related to text files matches this answer more:
You can have as many excludes as you liked added on
tar --exclude=*.txt --exclude=*.filetype2 --exclude=*.filetype3 -cvzf site-backup.tar.gz www.site.com
mv www.site.com www.site.com.1
tar -xvzf site-backup.tar.gz
You can use the wget for that, but if there are no links to that files. I.e. they exist, but they are not referenced from any html page, then bruteforce is the only option.
cp -aiv /www.site.com/samples/*.h /somefolder/
http://linux.die.net/man/1/cp

Resources