Concatenate two GSUtil Command - concatenation

I want to execute two gsutil command in a single line, how can i achieve that.
For ex:
gsutil ls gs://projectname/bucketname/folder1/folder2/filename.png | \
cp gs://projectname/bucketname/folder3/folder4/
Find/ List a file from specific bucket and copy the same file to the bucket folder. In the above command i'm using ls (list) and cp (copy) command, but this is not working as expected.
Something similar to the below shell script or linux command, we use exec and continue the next command right.
find -type f -path '*schedule*/*' -name "*.png" -exec cp -n {} /tmp/MusicFiles \;
Your early response is highly appreciated. Thanks in advance..!

Oneliner: From gsutil help cp:
-I Causes gsutil to read the list of files or objects to copy from
stdin. This allows you to run a program that generates the list
of files to upload/download.
So, some_program | gsutil -m cp -I gs://my-bucket.
Where some program can be gsutil ls ...
Additionally,
gsutil accepts wildcards, so that command would be something like this.
gsutul cp gs:/bucket-name/*schedule*/*/*.png gs://bucketname/folder3/folder4/
Notice that there it is possible to do single star * and double star ** for recursive wildcards.

Related

Adapt file renaming script so that it searches out each file in sub-directories

I have a .csv file in which looks something like this:
unnamed_0711-42_p1.mov,day1_0711-42_p1.mov
unnamed_0711-51_p2.mov,day1_0711-51_p2.mov
unnamed_0716-42_p1_2.mov,day1_0716-42_p1_2.mov
unnamed_0716-51_p2_2.mov,day1_0716-51_p2_2.mov
I have written this code to rename files from the name in field 1 (e.g. unnamed_0711-42_p1.mov), to the name in field 2 (e.g. day1_0711-42_p1.mov).
csv=/location/rename.csv
cat $csv | while IFS=, read -r -a arr; do mv "${arr[#]}"; done
However, this script only works when it and all the files that need to be renamed are in the same directory. This was okay previously, but now I need to find files in various subdirectories (without adding the full path to my .csv file).
How can I adapt my script so that is searches out the files in subdirectories then changes the name as before?
A simple way to make this work, though it leads to an inefficient script, is this:
for D in `find . -type d`
do
cat $csv | while IFS=, read -r -a arr; do mv "${arr[#]}"; done
done
This will run your command for every directory in the current directory, but this runs through the entire list of filenames for every subdirectory. An alternative would be to search for each file as you process it's name:
csv=files.csv
while IFS=, read -ra arr; do
while IFS= read -r -d '' old_file; do
old_dir=$(dirname "$old_file")
mv "$old_file" "$old_dir/${arr[1]}"
done < <(find . -name "${arr[0]}" -print0)
done<"$csv"
This uses find to locate each old filename, then uses dirname to get the directory of the old file (which we need so that mv does not place the renamed file into a different directory).
This will rename every instance of each file (i.e., if unnamed_0711-42_p1.mov appears in multiple subdirectories, each instance will be renamed to day1_0711-42_p1.mov). If you know each file name will only appear once, you can speed things up a bit by adding -print -quit to the end of the find command, before the pipe.
Below script
while IFS=, read -ra arr # -r to prevent mangling backslashes
do
find . -type f -name "${arr[0]}" -printf "mv '%p' '%h/${arr[1]}'" | bash
done<csvfile
should do it.
See [ find ] manpage to understand what the printf specifiers like %p,%h do

Run an awk script on every file of a certain type in a directory

I have a directory with several hundred .log files in it, and I have a script to pull some info out of them and print it to an existing file. Running it on one file goes like
awk -f HLGcheck.sh 1-1-1.log >> outputs.txt
and this works fine. I've looked around for several hours online and I can't seem to find a decent way to have it run on all .log files in the directory. Any help from people smarter than me would be appreciated.
Some techniques:
If the awk script can only handle one file at a time, use a for loop as shown or
find . -name '*.log' -exec awk -f HLGcheck.sh '{}' \; >> outputs.txt
If the awk script can handle multiple files:
awk -f HLGcheck.sh *.log >> outputs.txt
find . -name '*.log' -exec awk -f HLGcheck.sh '{}' \+ >> outputs.txt
bash has for loop for this purpose
$ for f in *.log; do your_processing_here; done
you can refer to the file in processing as $f

Mass rename objects on Google Cloud Storage

Is it possible to mass rename objects on Google Cloud Storage using gsutil (or some other tool)? I am trying to figure out a way to rename a bunch of images from *.JPG to *.jpg.
Here is a native way to do this in bash with an explanation below, line by line of the code:
gsutil ls gs://bucket_name/*.JPG > src-rename-list.txt
sed 's/\.JPG/\.jpg/g' src-rename-list.txt > dest-rename-list.txt
paste -d ' ' src-rename-list.txt dest-rename-list.txt | sed -e 's/^/gsutil\ mv\ /' | while read line; do bash -c "$line"; done
rm src-rename-list.txt; rm dest-rename-list.txt
The solution pushes 2 lists, one for the source and one for the destination file (to be used in the "gsutil mv" command):
gsutil ls gs://bucket_name/*.JPG > src-rename-list.txt
sed 's/\.JPG/\.jpg/g' src-rename-list.txt > dest-rename-list.txt
The line "gsutil mv " and the two files are concatenated line by line using the below code:
paste -d ' ' src-rename-list.txt dest-rename-list.txt | sed -e 's/^/gsutil\ mv\ /'
This then runs each line in a while loop:
while read line; do bash -c "$line"; done
Lastly, clean up and delete the files created:
rm src-rename-list.txt; rm dest-rename-list.txt
The above has been tested against a working Google Storage bucket.
https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames
gsutil supports URI wildcards
EDIT
gsutil 3.0 release note
As part of the bucket sub-directory support we changed the * wildcard to match only up to directory boundaries, and introduced the new ** wildcard...
Do you have directories under bucket? if so, maybe you need to go down to each directories or use **.
gsutil -m mv gs://my_bucket/**.JPG gs://my_bucket/**.jpg
or
gsutil -m mv gs://my_bucket/mydir/*.JPG gs://my_bucket/mydir/*.jpg
EDIT
gsutil doesn't support wildcard for destination so far (as of 4/12/'14)
nether API.
so at this moment you need to retrieve list of all JPG files,
and rename each files.
python example:
import subprocess
files = subprocess.check_output("gsutil ls gs://my_bucket/*.JPG",shell=True)
files = files.split("\n")[:-1]
for f in files:
subprocess.call("gsutil mv %s %s"%(f,f[:-3]+"jpg"),shell=True)
please note that this would take hours.
gsutil does not support parallelized and mass-copy/rename.
You have two options:
use a dataflow process to do the operation
or
use GNU parallel to launch it using several processes
If you use GNU Parallel, it is better to deploy a new instance to do the mass copy/rename operation:
First: - Make a list of files you want to copy/rename (a file with source and destination separated by a space or tab), like this:
gs://origin_bucket/path/file gs://dest_bucket/new_path/new_filename
Second: Launch a new compute instance
Third: Login in that instance and install Gnu parallel
sudo apt install parallel
Third: authorize yourself with google (gcloud auth login) because the service account for compute might not have permissions to move/rename the files.
gcloud auth login
Make the copy (gsutil cp) or move (gsutil mv) operation with parallel:
parallel -j 20 --colsep ' ' gsutil mv {1} {2} :::: file_with_source_destination_uris.txt
This will make 20 parallel runs of the gsutil cp operation.
Yes, it is possible:
Move/rename objects and/or subdirectories

Looking to take only main folder name within a tarball & match it to folders to see if it's been extracted

I have a situation where I need to keep .tgz files & if they've been extracted, remove the extracted directory & contents.
In all examples, the only top-level directory within the tarball has a different name than the tarball itself:
[host1]$ find / -name "*\#*.tgz" #(has an # symbol somewhere in the name)
/1-#-test.tgz
[host1]$ tar -tzvf /1-#-test.tgz | head -n 1 | awk '{ print $6 }'
TJ #(directory name)
What I'd like to accomplish (pulling my hair out; rusty scripting fingers), is to look at each tarball, see if the corresponding directory name (like above) exists. If it does, echo "rm -rf /directoryname" into an output file for review.
I can read all of the tarballs into an array ... but how to check the directories?
Frustrated & appreciate any help.
Maybe you're looking for something like this:
find / -name "*#*.tgz" | while read line; do
dir=$(tar ztf "$line" | awk -F/ '{print $6; exit}')
test -d "$dir" && echo "rm -fr '$dir'"
done
Explanation:
We iterate over the *#*.tgz files found with a while loop, line by line
Get the list of files in the tgz file with tar ztf "$line"
Since paths are separated by /, use that as the separator in the awk, print the 6th field. After the print we exit, making this equivalent to but more efficient than using head -n1 first
With dir=$(...) we put the entire output of the tar..awk chain, thus the 6th field of the first file in the tar, into the variable dir
We check if such directory exists, if yes then echo an rm command so you can review and execute later if looks good
My original answer used a find ... -exec but I think that's not so good in this particular case:
find / -name "*#*.tgz" -exec \
sh -c 'dir=$(tar ztf "{}" | awk -F/ "{print \$6; exit}");\
test -d "$dir" && echo "rm -fr \"$dir\""' \;
It's not so good because of running sh for every file, and since we are using {} in the subshell, we lose the usual benefits of a typical find ... -exec where special characters in {} are correctly handled.

grepping patterns and deleting files

I have an external file that contains a list of patterns (pattern per line).
pattern1
foo bar
pattern_n
bar
bar foo
I would like to grep all files including the ones within sub-folders using those patterns, if the pattern matches, copy the file to some /tmp/mybackup/ and then delete it. What would be a good way of doing this?
If I understand your problem correctly, you need the following switches to grep:
-R to scan recursively
-l to print only matching filenames
-f to read the patterns from a file
-I to ignore binary files
so:
grep -RlIf patterns-file *
then feed this result to some other utility to perform the backup, eg xargs:
grep -RlIf patterns-file * | xargs -I {} mv {} /tmp/backup
or with a loop:
for afile in `grep -RlIf patterns-file *`; do
mv $afile /tmp/backup
done
Try
for x in `fgrep -f patternfile.txt -l -r .`; do cp $x /tmp/mybackup; rm $x; done

Resources