How to avoid sub folders in snowflake copy statement - snowflake-cloud-data-platform

I have a requirement to exclude certain folder from prefix and process the data in snowflake (Copy statement)
In the below example I need to process files under emp/ and exclude files from abc/
Input :
s3://bucket1/emp/
E1.CSV
E2.CSV
/abc/E11.csv
s3://bucket1/emp/abc/ - E11.csv
Output :
s3://bucket1/emp/
E1.CSV
E2.CSV
Is there any suggestion around pattern to handle this ?

With the pattern keyword you can try to exclude certain files. However when using the pattern matching with the NOT syntax, you exclude any file with any of the characters.
Assuming your stage URL is defined as s3://bucket1/emp/
LS #MY_STAGE pattern = '[^abc].*';
Excludes anything starting with a, b, or c
LS #MY_STAGE pattern = '[^a][^b][^c][^\\/].*';
Excludes anything where:
The first character is a, OR
The second character is b, OR
The third character is c, OR
The fourth character is a forward slash /
Edit
After testing with Sharvan's example. Here is what I've found:
Doesn't work:
ls #my_stage PATTERN='^((?!/abc/).)*$'; because the first forward slash is duplicated as part of the stage URL (it is automatically appended to the stage URL if not present)
Works: ls #my_stage PATTERN='^((?!abc/).)*$'; because the first forward slash is removed
Updated as the forward slash does not need to be escaped
Snowflake does not support backreferences (per their documentation) but there is no mention of lookaheads or lookbehinds, which I thought was un-supported.
https://docs.snowflake.net/manuals/sql-reference/functions-regexp.html#backreferences

Use this to exclude the prefix pattern
ls #stage PATTERN='^((?!/abc/).)*$'

Related

ksh: remove last extension from a multiple extension filename

I have a filename in the format dir1/dir2/filename.txt.org and I like to rename this to dir1/dir2/filename.txt . how can this be done. I tried 'cut' with '.' separator but it also removes .txt
You can try korn shell variable expansion formats, instead of using a subprocess (e.g. cut) . This can be much faster.
example:
var1=dir1/dir2/filename.txt.org
var2=${var1%.*}
If you now print $var2 its value will be dir1/dir2/filename.txt
The % tells it to delete the smallest matching rightmost match for .* (which means anything following the rightmost period character).
${variable%pattern} - return the value of variable without the smallest ending portion that matches pattern.
Other variable expansion formats are available, it is worthwhile to study the docs.

stripping text from beginning/end of an variable

I have an array containing a list of backup files, I want to go through and strip off the leading /path/to/file/ and the trailing _date_stamp.tar.gz My code works to strip off the leading pathtofile and if I set it to just strip off the .tar.gz it works, but if I try to strip the date it fails. So as an example I want to take:
/path/to/file/backup_domain1.com_02_16_2015.tar.gz
and be left with:
domain1.com
This removed from start: /path/to/file/backup_
This removed from end: _02_16_2015.tar.gz but obviously as they are date stamped then the integers will vary.
My code snippet:
# strip leading path/to/file :
$bubasedir=/path/to/file
buarray=( "${buarray[#]#"$bubasedir/backup_"}" )
buarray=( "${buarray[#]%".tar.gz"}" )
This strips .tar.gz but I need to strip the date as well.
Use an expression which matches the date expression, just like you do for the prefix. Assuming the domain name cannot contain an underscore (as per the DNS spec, but sometimes violated for internal domains and special domains like _dkim),
buarray=( "${buarray[#]%%_*}" )
%% says to trim the longest possible match and _* matches everything starting from an underscore. ("${buarray[#]%_*}" would trim from the last underscore.)

Writing a batch program for renaming files

Gentlemen and -women, one question regarding programming:
I need to write a program to batch rename files, under the following conditions:
All files are in a directory. Can be the same directory as the executable, if this simplifies things pathwise.
All files are .pdf, but this should not matter I believe.
All files are start with double digit prefix ranging from 01 til 99
e.g.: 01_FileNameOriginal1.pdf ; 02_FileNameOriginal2.pdf
This double digit needs to stay.
All names must be modified to a predefined range of filenames (sort of a database) which can be embedded in the executable, or read out from an external file (being .txt, .csv, whatever is most feasible).
e.g.: 01_NewName1.pdf ; 02_NewName2.pdf ; ...
Some original filenames contain an expiry date, which is labelled "EXP YYYY MMM DD", and should be appended to the new name in a different format: "e_YYYY_MM_DD". So basically it needs to be able to use a "for" statement (to loop for the number of files), and "if" statement to search for the "EXP" (matching case), cut the string and append in the rearranged format before the file extension.
The result of the program can be either renaming, or returning a renamed copy. The former seems easier to do.
So to summarize:
step 1: Run program
step 2: integer = number of files
step 3: loop:
3.A check first two digits, copy to a temp string.
3.B compare the temp string with an array of predefined filenames. The array can be embedded in the code, or read externally. For the user friendliness sake, an external read from a .csv seems easier to modify the filenames later on.
3.C to rename or not to rename. In case of a match:
Assume the old file has the following name: 01_FileNameOriginal EXP YYYY MM DD Excessivetext.pdf
Copy first two digits to a temp2 string
Scan the old name for EXP (for length of filename, if = "EXP ", matching case) and cut the following 10 characters. Put YYYY, MM, and DD in seperate strings. All essential value has now been extracted from the old filename.
Match dbl digits with the first two digits of filenames in the database. Copy the new name to a temp string.
Rename the file with the new name: eg 01_NewName.pdf
Append date strings before the extension: eg 01_NewName_e_YYYY_MM_DD.pdf
Note: date can be extracted in a single string, as long as spaces are replaced by underscores. Probably shorter to program.
in case of no match: copy old filename, and put it in a temp string, to return at the end of the process as an error (or .txt) file, with filenames that could not be renamed.
step 4: finish and return error or report (see previous)
Question:
Based on these conditions, what would be the easiest way to get started? Are there freeware programs that can do such a thing. If not what is the best approach. I have basic programming knowledge (Java/VBA), some small C++ stints but nothing spectacular. I have only programmed in a programming environment and have never produced any executables or batch files or the likes so I don't have any idea how to start atm, but wouldn't mind to give it a shot. As long as it's a guided shot, and not one in the dark cos that's where I am now.
Would love to hear some thoughts on this.
Greetings
Wouter

How do I find out if a file name has any extension in Unix?

I need to find out if a file or directory name contains any extension in Unix for a Bourne shell scripting.
The logic will be:
If there is a file extension
Remove the extension
And use the file name without the extension
This is my first question in SO so will be great to hear from someone.
The concept of an extension isn't as strictly well-defined as in traditional / toy DOS 8+3 filenames. If you want to find file names containing a dot where the dot is not the first character, try this.
case $filename in
[!.]*.*) filename=${filename%.*};;
esac
This will trim the extension (as per the above definition, starting from the last dot if there are several) from $filename if there is one, otherwise no nothing.
If you will not be processing files whose names might start with a dot, the case is superfluous, as the assignment will also not touch the value if there isn't a dot; but with this belt-and-suspenders example, you can easily pick the approach you prefer, in case you need to extend it, one way or another.
To also handle files where there is a dot, as long as it's not the first character (but it's okay if the first character is also a dot), try the pattern ?*.*.
The case expression in pattern ) commands ;; esac syntax may look weird or scary, but it's quite versatile, and well worth learning.
I would use a shell agnostic solution. Runing the name through:
cut -d . -f 1
will give you everything up to the first dot ('-d .' sets the delimeter and '-f 1' selects the first field). You can play with the params (try '--complement' to reverse selection) and get pretty much anything you want.

vim, reformat text to initializers

I've a big file with lines that look like
2 No route to specified transit
network
3 No route to destination
i.e. a number at the start of a line followed by a description.
And I'd like to transform that for use as a struct initializer
{2,"No route to specified transit
network"},
{3,"No route to destination"},
How would I do this ?
Try
:%s/^\(\d\+\)\s\(.*\)$/{\1, "\2"},/
This uses search-and-replace and searches for a line starting with a digit, followed by whitespace, followed by arbitrary text until the end of the line. This is replaced by the pattern you specified.
Or, using “more magic” (thanks to Al in the comments):
:%s/\v^(\d+)\s(.*)$/{\1, "\2"},/

Resources